Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

protoc-gen-openapiv2/options/*.proto paths don't match package name #5182

Open
micolous opened this issue Jan 28, 2025 · 10 comments
Open

protoc-gen-openapiv2/options/*.proto paths don't match package name #5182

micolous opened this issue Jan 28, 2025 · 10 comments

Comments

@micolous
Copy link

micolous commented Jan 28, 2025

🐛 Bug Report

protoc-gen-openapiv2's Protobuf annotations file paths don't match the package name (protoc-gen-openapiv2/options/*.proto vs grpc.gateway.protoc_gen_openapiv2.options).

The present behaviour conflicts with the Protobuf style guide (emphasis added):

Package names should have unique names based on the project name, and possibly based on the path of the file containing the protocol buffer type definitions.

This also would have been flagged by the Buf lint rules PACKAGE_DIRECTORY_MATCH and PACKAGE_LOWER_SNAKE_CASE, but this project currently disables them in its buf.yaml.

Background / expected behaviour

As an example, the google.api and google.protobuf types are in google/api/*.proto and google/protobuf/*.proto respectively, and can be used in your own descriptors with:

import "google/api/annotations.proto";

service MyService {
  rpc ExampleMethod(ExampleMethodRequest) returns (ExampleMethodResponse) {
    option (google.api.http) = {
      post: "/v1/example/method";
    };
  }
}

The .proto file paths are consistent with their package names, and it's possible to ship .proto files along side other built artefacts in Java, JavaScript, Python and probably some other languages.

It's also possible to reflect these annotations at runtime from your application code.

protoc-gen-openapiv2 annotations should be usable in the same way, with an import that matches the complete package name:

import "grpc/gateway/protoc_gen_openapiv2/options/annotations.proto";

option (grpc.gateway.protoc_gen_openapiv2.options.openapiv2_swagger) = {
  info: {
    name: "My API";
  };
};

But this is not possible due to this inconsistency.

Existing behaviour

protoc requires that all imports and types resolve properly, including annotations, when generating either client or server-side stubs, even they are never referenced at run-time. This also constrains protoc-gen-openapiv2's annotations to the intersection of all language-specific Protobuf rules, not just Golang's.

Some tooling (like hatch-protobuf) will recursively enumerate all .proto files in a directory and attempt to build stubs, with no way to selectively include/exclude files. When applied to this repository, this means including a bunch of examples and internal implementation details.

The existing behaviour forces downstream consumers to keep a locally-modified copy of the descriptors with corrected imports and folder layout. However, this risks version skew, and all downstream consumers need to apply similar fixes.

Suggested fixes

This makes everything consistent with the defined package name, and also means the file paths use hyphens, rather than underscores (which is nice for Java, Python, Rust and probably others).

This also makes it possible to include grpc-gateway as a git submodule, and have a --proto_path flag like third_party/grpc-gateway/protoc-gen-openapiv2/proto/, and only pick up the public protoc-gen-openapiv2 Protobuf descriptors.

It would also make it easier to make a regular Python package with something like hatch-protobuf, and take advantage of consistent import paths between Python and protoc, like you can with googleapis-common-protos and base google.protobuf types.

Alternatives considered

Using buf's schema registry to pull in protoc-gen-openapiv2 descriptors side-steps the path issues, but has a very low rate limit (10 requests/hour).

I'm aware of the py_protoc_gen_openapiv2 package – but I don't like the approach as it builds buf from source (so depends on a Golang toolchain), it doesn't include .proto files as artefacts that other tooling can use, and its releases don't match with grpc-gateway versions. There are other forks which use buf in different ways, but they all have similar problems.

I've got an alternative local package for Python which has corrected import paths and don't depend on any of the buf tooling, and it's just a pyproject.toml file that uses hatch-protobuf – there's no need to manually install protoc (as grpc-tools includes protoc), and everything just works.

@johanbrandhorst
Copy link
Collaborator

Hi, thanks for your issue. This sounds like a dupe of #2173, but I appreciate the amount of detail you've added and since stalebot closed the other one (we've since removed stalebot), lets keep this open. I have a few thoughts, but mostly they echo what I said in the original issue - this is a very unfortunate consequence of the original path to these files and the way python creates its imports. We cannot move these files as it would break a lot of existing users. As far as I can tell, the suggested fix hasn't considered the consequences of this breaking change. Am I missing something?

I'd be happy to accept any PRs that modify our bazel setup to make this work better for users, but we cannot move the protobuf files, sorry.

@rohitlohar45
Copy link
Contributor

Hi @johanbrandhorst,

I understand the concern about not moving the .proto files to avoid breaking existing users. Would it be helpful if I submit a PR that implements a non-breaking solution supporting both paths?

The approach would be:

Keep existing files where they are (protoc-gen-openapiv2/options/*.proto).
Create symbolic links (or copies) in the correct package structure (proto/grpc/gateway/protoc_gen_openapiv2/options/*.proto).
Update the Bazel build configuration to support both paths.
Add Makefile targets to maintain the symbolic links.
This solution would allow tools that require a consistent package-path structure to work correctly while maintaining full backward compatibility for existing users.

I can start with the build system changes first for review if this approach seems viable. Let me know if you’d like me to proceed or if you have any other suggestions.

Thanks!

@johanbrandhorst
Copy link
Collaborator

Does git even support symbolic links? I don't know if that would work at all. If we can come up with some solution that helps users I'd be happy to consider it though. Please feel free to experiment.

@micolous
Copy link
Author

micolous commented Mar 26, 2025

This sounds like a dupe of #2173...

I'd say that #2173 (and also #4407, and probably others) are symptoms of the same underlying issue, and would be fixed if the directory structure matched the package names.

I think the path of the Golang package could stay as protoc-gen-openapiv2, as long as there's a path in which the directory structure matches the package name.

There's a reason there's linter rules for this. 😉

I'd be happy to accept any PRs that modify our bazel setup to make this work better for users

Unfortunately, I don't think that would help me, because I'm not currently using Bazel, and I don't want to add it as a build dependency.

Just looking at Python, Bazel does not (cleanly) integrate as PEP-517 build frontend. Because my protoc-gen-openapiv2 descriptor package is part of a workspace, switching any one package to Bazel would mean I'd need to port my entire workspace's build configuration to Bazel, or I'd need to package and publish those descriptors outside of the workspace (which is an eventual goal, just not today).

Does git even support symbolic links?

Yes. It even works on Windows now, but requires some manual configuration.

However, some other tooling may exclude symbolic links – eg: hatch-protobuf does not follow symlinks, so proto/grpc/gateway/protoc_gen_openapiv2/options/*.proto being symlinks would not work without further tooling changes.

@rohitlohar45's approach in #5182 (comment) sounds good to me, but I would make proto/grpc/gateway/protoc_gen_openapiv2/options/*.proto the actual files.

Edit (2025-03-27): Unfortunately, symlinks would break at least one side, no matter which way around it's done, so you could only maintain Bazel-based compatibility: #5182 (comment)

protoc-gen-openapiv2/options/*.pb.go probably shouldn't be checked in to git, and instead it should be built in the application's build process. These generated files are intended for a particular version of protoc and the Protobuf library, so could lead to build failures if used as-is.

If Bazel is the intended build tool for protoc-gen-openapiv2 and grpc-gateway, then these checked-in files are never used, they'll be pulled from genfiles. This could make adding symlinks unnecessary – you could have a bunch of go_proto_library() targets (or equivalent) in proto/grpc/gateway/protoc_gen_openapiv2/options/BUILD.bazel which protoc-gen-openapiv2 and grpc-gateway depend on.

protoc-gen-openapiv2/options/BUILD.bazel could gain alias() rules to help anyone using the existing targets with Bazel.

Eventually, proto/grpc/gateway/protoc_gen_openapiv2/options/BUILD.bazel could be expanded to other language *_proto_library() targets for downstream Bazel users using other languages, and they could make proto/ the root of an external workspace reference.

@johanbrandhorst
Copy link
Collaborator

I'll just reiterate that we will not be able to change the folder structure without breaking users, certainly many more than the number of python users it would help. Any resolution here would have to be backwards compatible. Please do experiment and make practical proposals if we can find a solution though.

@micolous
Copy link
Author

I'll just reiterate that we will not be able to change the folder structure without breaking users...

I agree not that breaking existing users is important, but it's not clear to me who would be broken by such a change. I will need your help to identify them. 😄

The use cases I can think of:

  • Bazel: I've already offered a solution (alias() rules).

    There are deprecation flags you can attach to those as well, which may help someone using buildifier automatically update to the new rules, but there are particularly aggressive linter setups which can fail a builds that references a deprecated target.

  • Anyone who has manually copied the Protobuf files into their own source tree (eg: to correct this issue manually) will not be affected by changing the paths.

  • Anyone who is automatically copying and patching the Protobuf files into their source tree (eg: like Copybara) will be broken when the files move, and if the tool does not understand symlinks.

    However, this would at most create a broken PR, which should fail CI/CD checks and needs updating.

    If they have no CI/CD checks, or just auto-approve the change, then that really can't be helped.

  • Anyone who is pulling this code in as a git submodule (which only works as-is with a limited subset of all Protobuf supported languages) has a reference to a particular commit. That will work forever.

    If they update their reference to a new commit where the files have moved and their tool does not follow symlinks, then it will break, and will need updating.

    But it is unlikely that anyone will have multiple submodule references to different versions of this repository, so this is a "migrate".

  • Anyone who is pulling this code in from git but referencing a tag (eg: shell script which runs git clone --branch ...) will break as soon as they switch to a new tag, like with submodules. Same caveats apply.

  • Anyone who is pulling this code in from git but a naive git clone of main will be broken as soon as it is committed. However, such a setup can be broken by any number of things, and there is no way for grpc-gateway to reasonably defend against a downstream user with zero change control.

...certainly many more than the number of python users it would help.

Python is not the only Protobuf supported language which does not allow - in package/namespace identifiers. This also affects:

But doesn't affect:

  • Dart: does not have namespaces, relies on file paths
  • Golang: does not have namespaces, relies on file paths
  • Objective-C: does not have namespaces, relies on objc_class_prefix

I couldn't figure out if it affected Ruby or not, but it seems like a pretty common restriction.

Protobuf has special attributes in some cases (eg: java_package), or translates - to _ automatically. However, this means the generated code ends up in a totally different folder structure to the .proto files, and you can't package the .proto and generated code together anymore (though for Java/JRE this is expected due to namespace style differences).

I acknowledge that fixing the bug comes with risks and a migration cost, but that doesn't prevent it from working where it already works today, and opens it up to a much larger ecosystem. There is a lint for this, it's just unfortunate the original author dismissed it because Golang does not have namespaces.

I think the approach proposed is on the right track, it could just use a couple of tweaks. I worry that the fear of breaking things will lead to inaction. 😅

@micolous
Copy link
Author

micolous commented Mar 26, 2025

Unfortunately, I've found a flaw in in using symlinks as a compatibility measure: annotations.proto imports from openapiv2.proto. Protobuf doesn't support importing from relative paths, only paths relative to -I/--proto_path flag(s), so you can't work around it that way.

Using @rohitlohar45's approach as an example, you'd end up with this folder structure:

.
├── proto
│   └── grpc
│       └── gateway
│           └── protoc_gen_openapiv2
│               └── options
│                   ├── annotations.proto -> ../../../../../protoc-gen-openapiv2/options/annotations.proto
│                   └── openapiv2.proto -> ../../../../../protoc-gen-openapiv2/options/openapiv2.proto
└── protoc-gen-openapiv2
    └── options
        ├── annotations.proto
        └── openapiv2.proto

If you left annotations.proto as-is, with the dashed-import path:

import "protoc-gen-openapiv2/options/openapiv2.proto";

You would build Golang descriptors for protoc-gen-openapiv2 with something like:

protoc --proto_path=. --go_out=. --go_opt=paths=source_relative protoc-gen-openapiv2/options/*.proto

And that would be fine.

But if you tried to build descriptors using proto/ as the proto_path, eg:

protoc --proto_path=proto --go_out=out --go_opt=paths=source_relative proto/grpc/gateway/protoc_gen_openapiv2/options/*.proto

Then annotations.proto would fail to build when importing openapiv2.proto, because it'd be looking for it in proto/protoc-gen-openapiv2/options/openapiv2.proto.

If you switched it around (like I previously suggested) and made proto/grpc/gateway/protoc_gen_openapiv2/options/ contain actual files, updated the imports to be relative to proto/, and put symlinks in protoc-gen-openapiv2/options/, then it'd break running protoc --proto_path=. protoc-gen-openapiv2/options/*.proto.

Alternatives

There's a few ways around this, but they all have drawbacks:

  • Duplicate the descriptors into another folder structure where package names match folder names, and have a script patch the files so that they have imports matching their folder structure, verified in CI.

    This would not break existing users, but means a pile of technical debt to maintain.

  • Produce a build artefact (or ideally, automated git repository) which contains descriptors with corrected import paths.

    Same costs/benefits as previous, but means there's a pretty explicit line around what is the interface for other languages.

    This is pretty much what other language users are doing already.

  • Same as previous, but make the descriptor source of truth the external repository. That would make it closer to something like the googleapis repository.

    Same costs/benefits as previous, but also makes changing those descriptors and updating the tool a two-step process. But it also means the change process is driven by the API, and forces it to consider backward and forward compatibility.

  • Have a tool to automatically strip protoc-gen-openapiv2 descriptors from the service definition before using it in your actual gRPC service / clients.

    These descriptors being present are the core problem, and unfortunately there's no way in the Protobuf language to conditionally include them (ie: #ifdef).

    Having a strip phase makes the build process for everyone using these descriptors more complex, but means that they don't need protoc-gen-openapiv2 descriptors to be available at build or run time, unless they actually use protoc-gen-openapiv2 descriptors to implement consistent validation rules, checking OAuth 2.0 scopes, etc.

    So at least in the simple case, it would make things much easier, and wouldn't break existing users of these descriptors.

  • Migrate the protoc-gen-openapiv2 build process to use the new descriptor folder structure, and retain Bazel-based compatibility with alias().

    Potential breakage scenarios are covered in my last comment.

What do you think? 😄

@johanbrandhorst
Copy link
Collaborator

Thank you for exploring the alternatives, however, I feel that we sort of skipped over one of the solutions that already exists today - the buf schema registry. We push the descriptors to the schema registry, which make them a first party source for others to depend on them. Why isn't that good enough? The 10/hr BSR rate limits applies to anonymous requests only, it's 960/hr for authenticated requests: https://buf.build/docs/bsr/rate-limits. Is this not good enough?

@micolous
Copy link
Author

micolous commented Mar 28, 2025

We push the descriptors to the schema registry, which make them a first party source for others to depend on them. Why isn't that good enough? The 10/hr BSR rate limits applies to anonymous requests only, it's 960/hr for authenticated requests: https://buf.build/docs/bsr/rate-limits. Is this not good enough?

I’ve previously evaluated using a Buf-based toolchain for my project, and hit those rate limits extremely easily. Passing build secrets to Docker is tedious (I know how), and also very easy to end up with them baked into your final images.

Setting up bot accounts (which I’d need for CI) are locked behind a Pro/Enterprise subscription, and it looks like they only support using static API keys, which I’d need to store as secrets, and copy to every repository which uses Buf. Shared secrets like that are really unsafe, and I’d rather use OIDC delegated identities – and it looks like while Buf supports OIDC (on Pro plans), it’s only for humans.

I could possibly setup a caching BSR proxy with some credentials, but that’s infrastructure to roll out and maintain.

Compare this with something like Docker, where the Docker Hub rate limits are difficult to hit, they have higher rate limits when fetching “blessed” open source projects (which if BSR did something similar, I’d expect it to qualify), and GitLab has a built-in dependency proxy. And Docker ships a lot more data around.

Using Buf doesn’t really solve the problem though, because even if I fetched the .protos from BSR, I’d still need to fix their paths.

I don’t see what value there is in using BSR vs. using a plain git repository that I can submodule, when the files are right there and so… tantalisingly… close… to being exactly what I need. 😄

What I’ve done at the moment is copied the files into a repository with the patches applied (and a Python package which builds out of the box with entirely native build tools, thanks to grpcio-tools and hatch-protobuf), and then that repository is submoduled into where I need any .proto files (such as building a FileDescriptorSet for Envoy).

@johanbrandhorst
Copy link
Collaborator

OK so the alternative of having something like an auto-syncing downstream repository might be the easiest thing then. It wouldn't necessarily need to be associated with the grpc-gateway itself, but we could link to it in this issue and the README. Do you already have a script to set something like this up?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants