Resolve Protobuf Confluent Schema Registry references correctly for subjects containing slashes#858
Conversation
Protobuf schema references whose subject contains `/` must be path-escaped when requesting referenced schemas from Confluent Schema Registry. Preserve the escaped request path so these subjects are resolved correctly instead of being double-escaped or interpreted as multiple URL segments.
| reqURL := *c.schemaRegistryBaseURL | ||
| if reqURL.Path, err = url.JoinPath(reqURL.Path, reqPath); err != nil { | ||
| var joinedPath string | ||
| if joinedPath, err = url.JoinPath(reqURL.EscapedPath(), reqPath); err != nil { |
There was a problem hiding this comment.
If the path is already escaped, wouldn't this double escape it?
There was a problem hiding this comment.
I do not think so. reqURL is the URL that is loaded from the configuration and EscapedPath() makes sure that the path is properly escaped under Go's encoding rules.
So we are joining the escaped path segment from reqURL with the other escaped path segment reqPath yielding a consistently escaped joinedPath.
The actual fix is setting the Path and RawPath members of the URL structure where Path is the unescaped and RawPath the escaped variant. This ensures that reqURL.String() handed over to http.NewRequestWithContext() uses the properly escaped version.
Here is an example how this plays out:
reqPath is already in escaped form. GetSchemaBySubjectAndVersion assembles a path for a schema reference. In my use cases, these references take the form of <package>/<proto file>, because I am using imports in the form of
import "commons/SomeEnum.proto"
in my Protobuf schemas that are managed via Confluent Schema Registry.
GetSchemaBySubjectAndVersion correctly escapes the slash and hands subjects/commons%2FSomeEnum.proto/version/X to doRequest().
joinedPath makes sure that any path segment in reqURL is properly escaped before joining with the other already escaped `reqPath. So far so good.
If we leave it at that, reqURL.String() which is handed down to http.NewRequestWithContext() would turn my path into subjects/commons%252FSomeEnum.proto/version/X because it escapes an already escaped path and thus breaks the schema lookup.
We fix this, setting the Path and RawPath members of the URL struct
reqURL.Path, err = url.PathUnescape(jonedPath) // subjects/commons/SomeEnum.proto/version/X
reqURL.RawPath = joinedPath // subjects/commons%FESomeEnum.proto/version/X
Now that URL knows the already escaped version (RawPath) and the unescaped version (Path), URL.String() assembles the final URL correctly.
Long story short. We have to set the RawPath correctly, but we have to set Path to the unescaped variant, for reqURL.String() to return the correctly escaped variant.
| defer reqMut.Unlock() | ||
|
|
||
| b, err := fn(r.URL.Path) | ||
| b, err := fn(r.URL.EscapedPath()) |
There was a problem hiding this comment.
How come we're changing the path here now?
There was a problem hiding this comment.
The already existing test TestProtobufReferences() is supposed to verify exactly that use case which I have fixed (slash character in protobuf reference name). Unfortunately, the test also has a bug.
This test runs a schema registry server which already expects the path /subjects/stuffs%2Fthething.proto/versions/10 when resolving the main schema, i.e. it expects the escaped or encoded variant.
According to the documentation r.URL.Path stores the unescaped or decoded variant of the path, which should be /subjects/stuffs/thething.proto/versions/10.
Before my change, the Path member stored /subjects/stuffs%2Fthething.proto/versions/10, i.e. it treated the encoded variant as decoded form of the path which is not correct. We need to pass the encoded or escaped variant to the function call. This is why we need this change.
There was a problem hiding this comment.
Could we add a test case for this? I want to make sure the logic is correct in the confluent client
There was a problem hiding this comment.
A test which verifies that schema references containing slash characters are correctly resolved already exists (c.f. TestProtobufReferences()).
There was a problem hiding this comment.
I have added a new TestProtobufReferencesSpecialCharacters() test case which validates that EscapedPath() returns a properly escaped path which represents the actual unescaped Path for the Protobuf references.
…hema-registry-reference-resolution
Protobuf schema references whose subject contains
/must be path-escaped when requesting referenced schemas from Confluent Schema Registry.Preserve the escaped request path so these subjects are resolved correctly instead of being double-escaped or interpreted as multiple URL segments.