Skip to content

aws_replace_quote_entities() only handles ", fails round-trip for spec-conformant XML emitting " #627

@harshavardhana

Description

@harshavardhana

Summary

aws_replace_quote_entities() in source/s3_util.c:339 does a literal
byte match against the 6-byte string " and copies anything else
through verbatim. As a result, ETag values returned by S3-compatible
servers that emit the equivalent numeric character reference ("
decimal or " hex) end up with the literal entity text in the
response ETag header used for subsequent If-Match requests, which
then fail with HTTP 412.

Per XML 1.0 §4.6, the predefined " and the numeric reference
" are semantically identical and both forms are spec-conformant.
A receiver should accept either.

Where it bites

source/s3_auto_ranged_put.c reads the <ETag> from the
CompleteMultipartUpload response, runs it through
aws_replace_quote_entities(), and stores the result as the response's
ETag header. If the server emitted &#34;…&#34;, the header now
literally contains the seven characters &#34;, and any follow-up
conditional request (If-Match, CopyObject source If-Match, etc.)
fails to match.

Repro

Any server using Go's encoding/xml exhibits this — the stdlib
escapeText hardcodes "&#34; (numeric refs are used uniformly
to keep escaping context-independent across attributes vs element
content). Trivial repro:

xml.NewEncoder(os.Stdout).Encode(struct{ ETag string }{ETag: `"abc-1"`})
// <ETag>&#34;abc-1&#34;</ETag>

Pointing aws-c-s3 (or anything CRT-based — mountpoint-s3, AWS CLI v2
with CRT) at such a server, doing a multipart upload, and then issuing
a request with If-Match: <returned ETag> reproduces the 412.

Suggested fix

Two options:

  1. Minimal: extend the recognized set in
    aws_replace_quote_entities() to include &#34; and &#x22; (and
    ideally &apos;/&#39;/&#x27; for symmetry).
  2. Better: replace the ad-hoc string match with a proper XML
    entity decoder pass (the five predefined entities + numeric
    character references). The function only runs on already-extracted
    XML text, so a real decoder is appropriate.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions