Skip to content

Resolve Protobuf Confluent Schema Registry references correctly for subjects containing slashes#858

Open
PitchBlack07 wants to merge 3 commits into
warpstreamlabs:mainfrom
PitchBlack07:bugfix/confluent-schema-registry-reference-resolution
Open

Resolve Protobuf Confluent Schema Registry references correctly for subjects containing slashes#858
PitchBlack07 wants to merge 3 commits into
warpstreamlabs:mainfrom
PitchBlack07:bugfix/confluent-schema-registry-reference-resolution

Conversation

@PitchBlack07

Copy link
Copy Markdown

Protobuf schema references whose subject contains / must be path-escaped when requesting referenced schemas from Confluent Schema Registry.

Preserve the escaped request path so these subjects are resolved correctly instead of being double-escaped or interpreted as multiple URL segments.

Protobuf schema references whose subject contains `/` must be path-escaped
when requesting referenced schemas from Confluent Schema Registry.

Preserve the escaped request path so these subjects are resolved correctly
instead of being double-escaped or interpreted as multiple URL segments.
reqURL := *c.schemaRegistryBaseURL
if reqURL.Path, err = url.JoinPath(reqURL.Path, reqPath); err != nil {
var joinedPath string
if joinedPath, err = url.JoinPath(reqURL.EscapedPath(), reqPath); err != nil {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the path is already escaped, wouldn't this double escape it?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think so. reqURL is the URL that is loaded from the configuration and EscapedPath() makes sure that the path is properly escaped under Go's encoding rules.

So we are joining the escaped path segment from reqURL with the other escaped path segment reqPath yielding a consistently escaped joinedPath.

The actual fix is setting the Path and RawPath members of the URL structure where Path is the unescaped and RawPath the escaped variant. This ensures that reqURL.String() handed over to http.NewRequestWithContext() uses the properly escaped version.

Here is an example how this plays out:

reqPath is already in escaped form. GetSchemaBySubjectAndVersion assembles a path for a schema reference. In my use cases, these references take the form of <package>/<proto file>, because I am using imports in the form of

import "commons/SomeEnum.proto"

in my Protobuf schemas that are managed via Confluent Schema Registry.

GetSchemaBySubjectAndVersion correctly escapes the slash and hands subjects/commons%2FSomeEnum.proto/version/X to doRequest().

joinedPath makes sure that any path segment in reqURL is properly escaped before joining with the other already escaped `reqPath. So far so good.

If we leave it at that, reqURL.String() which is handed down to http.NewRequestWithContext() would turn my path into subjects/commons%252FSomeEnum.proto/version/X because it escapes an already escaped path and thus breaks the schema lookup.

We fix this, setting the Path and RawPath members of the URL struct

reqURL.Path, err = url.PathUnescape(jonedPath) // subjects/commons/SomeEnum.proto/version/X
reqURL.RawPath = joinedPath // subjects/commons%FESomeEnum.proto/version/X

Now that URL knows the already escaped version (RawPath) and the unescaped version (Path), URL.String() assembles the final URL correctly.

Long story short. We have to set the RawPath correctly, but we have to set Path to the unescaped variant, for reqURL.String() to return the correctly escaped variant.

defer reqMut.Unlock()

b, err := fn(r.URL.Path)
b, err := fn(r.URL.EscapedPath())

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How come we're changing the path here now?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The already existing test TestProtobufReferences() is supposed to verify exactly that use case which I have fixed (slash character in protobuf reference name). Unfortunately, the test also has a bug.

This test runs a schema registry server which already expects the path /subjects/stuffs%2Fthething.proto/versions/10 when resolving the main schema, i.e. it expects the escaped or encoded variant.

According to the documentation r.URL.Path stores the unescaped or decoded variant of the path, which should be /subjects/stuffs/thething.proto/versions/10.

Before my change, the Path member stored /subjects/stuffs%2Fthething.proto/versions/10, i.e. it treated the encoded variant as decoded form of the path which is not correct. We need to pass the encoded or escaped variant to the function call. This is why we need this change.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add a test case for this? I want to make sure the logic is correct in the confluent client

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A test which verifies that schema references containing slash characters are correctly resolved already exists (c.f. TestProtobufReferences()).

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added a new TestProtobufReferencesSpecialCharacters() test case which validates that EscapedPath() returns a properly escaped path which represents the actual unescaped Path for the Protobuf references.

@PitchBlack07 PitchBlack07 requested a review from gregfurman June 8, 2026 05:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants