-
-
Notifications
You must be signed in to change notification settings - Fork 2.8k
translate-c: ability to annotate types in C code for better translation #2457
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
If we agree that the central point of this issue is making the best use possible of header files, then I can say that we have two fronts that we can attack the issue from: 1. Changing the header file 2. Adding knobs to @Cinclude() I suspect a full solution requires both, here's my reasoning: We have to deal with 2 problems that in my opinion require different solutions: I can see A being solved via nonnull or some similar annotation in the header file, while it seems to me it would be much more unwieldy to try and alter function signatures after the fact (please tell me if I'm wrong, I can only reason about this from the PoV of what the usage could look like from Zig). B, instead, seems solvable from Zig with something like this: const includeOpts = IncludeOpts {
.makeNonOptional = [][] const u8{"RedisModule_*"},
};
const redis = @cImport({
@cInclude("./redismodule.h", includeOpts);
}); The advantages of something like this would be the following:
Being able to specify by pattern matching which symbols you want to unpack (as I do in my example) would make this feature very easy to use, hardly any boilerplate. I imagine a similar argument could be made for single-item, unknown-length and null-terminated pointers (both for casting variables and re-interpreting function arguments). So it would be interesting to see if it is possible to provide the same choice to A (removing optionals from function arguments), but I don't know enough to say anything useful about it. Likewise, maybe it would be worth to come up with a symbol that the C project can add in front of variables that they want to mark as non-null once the header file has finished executing. The idea of offering a cooperative and a non-cooperative option seems intriguing. |
After watching yesterday's stream (great timing on that btw) and thinking about it a bit more I have a couple more thoughts to share. After listening to your explanation and reading some of the self-hosted code, I'm starting to get the hang of how translate-c works. I was originally imagining adding some kind of macro to the c header file but I see now that that's not possible, unless what we add is legit C syntax. I was originally naively thinking of some weird ifdef macro that would add an extra attribute to function arguments, but that would have been awful, and very likely to be rejected by any C maintainer.
Here's an alternative idea: what if we add these annotations as comments? //zig: (non-null, non-null to-many, ...)
typedef int (*RedisModuleCmdFunc)(RedisModuleCtx *ctx, RedisModuleString **argv, int argc); If we find a way to make the syntax reasonably human-readable, we could also consider dropping any Zig reference, in order to make the comment just a harmless hint for people reading the headerfile. More complete example: //pointer-hints: (non-null, (non-null to-many) -> non-null, non-pointer)
typedef int (*RedisModuleCmdFunc)(RedisModuleCtx *ctx, RedisModuleString **argv, int argc);
//var-usage-hint: RedisModule_* non-null
#define REDISMODULE_API_FUNC(x) (*x)
int REDISMODULE_API_FUNC(RedisModule_ReplyWithSimpleString)(RedisModuleCtx *ctx, const char *msg);
One last thing: I noticed that I made a very wrong assertion in the last comment. I stated something along the lines that variables will be initialized once the headerfile is executed, but that's not true at all in redismodule.h, since we are calling |
I like the "knobs" idea proposed by @kristoff-it, but understand it might be complex offering this configurability right into the compiler. But there's no problem which can't be solved by adding extra layers of indirection, and maybe comptime can help here to keep it as userland as possible ? Anyways, I think it's highly unlikely that upstream C projects would agree on annotating their code for consumption by another language's bindings. We would be better off working without expecting write access to the source. |
For many cases it's more than that e.g. gcc's nonnull attribute is useful for C consumers too. But there's an additional dimension: when zig is used to generate .h files: we should be able to map back to perfect zig type signatures if that .h file is imported with |
True, but unfortunately there might not be an attribute directly mapping to the zig concept we need to annotate. And even then, I'm not convinced maintainers would agree to add compiler specific attributes (right or wrong). The safe choice is not depending on upstream for this, and a pure zig solution might even allow "annotation" packets to be distributed with the associated pure-C library, streamlined inside zig's packet manager. |
Good points all around. I want to make note of another use case - which is when you want to use translate-c to port C code to Zig code, and you intend to drop the C code and start maintaining the Zig code. In such a situation translate-c is a one-time process after which the generated Zig code becomes source code maintained by hand. For this use case, one would be willing to do all manner of Zig-specific annotations in the C code, just to make the generated Zig code more readable. |
Maybe I can offer some more input after writing some code that made use of this functionality, although my experience is limited to Redis Modules. In my use-case I encountered 2 issues:
After some first-hand experience, these feel to me the most natural ways of dealing with those problems: 1.1 Should be solved on the Zig side. In my case (but I assume it's the same for all libraries), those variables are going to be null until you call const c_utils = @import("std").c_utils;
const redis = @cImport({
@cInclude("./redismodule.h");
});
const redis_api = comptime c_utils.CTranslatePrepareUnpacking(redis, "RedisModule_*");
// Entry point
export fn RedisModule_OnLoad(...) c_int {
// call the procedure that gathers function pointers
redis.RedisModule_Init.?(...);
// unpack functions once and for all
c_utils.CTranslateUnpack(redis_api, redis)
} In this example There are probably better ways to do design the API for this. I just wanted to make explicit the fact that it has to be a 2 step process (declare a type at comptime, unpack pointers at runtime when the user is ready to do that). With support for 2.2 Seems very problematic to solve from Zig. It's about redefining types and/or signatures that use such types. In my case, since Redis modules have basically a single type of callback, the shortest path was to edit manually the line where the type is defined, which in turn cascaded automatically to all other function that took a pointer of that type as an argument. One good property for our type-redefining annotations would be to be position-independent. This way we could transparently write them by hand (ideally in the right place, like the examples in my previous comments) or just blindly inject them in the beginning of the header file via a @cImport option, without having to know which precise line contains the definition. With 1 and 2 at our disposal it would be easy to both write a little bit of boilerplate when importing the .h file directly, and/or create a "wrapper" that adds value without being particularly hard to maintain. Having the heavy lifting done by Zig, the wrapper could focus on getting the idiom right. |
I want to mention one of use cases of custom C types annotation: default zeroes for "designated initialization". Probably the best example: sokol-gfx library, #1031: https://github.com/floooh/sokol-samples/blob/master/glfw/triangle-glfw.c#L40 /* prepare a resource binding struct, only the vertex buffer is needed here */
sg_bindings bind = {
.vertex_buffers[0] = vbuf
};
The other example is static arrays: libwebsockets usage C code: static struct lws_protocols protocols[] = {
{ "http", lws_callback_http_dummy, 0, 0 },
LWS_PLUGIN_PROTOCOL_MINIMAL,
{ NULL, NULL, 0, 0 } /* terminator */
}; Which would be nice to use the same in zig like: const server_protocols = [_]lws_protocols {
lws_protocols {
.name = c"http-only";
.callback = callback_http;
.per_session_data_size = @sizeOf(usize);
// here should be other 4 fields, previously zero initialized by default
},
....
} Right now its possible to make a zig "wrapper" library, mirroring each generated C structs, but with defaults. But its a lot of work and will require a lot of pointer casting to use that structs in c functions. |
Is this a good |
@daurnimator I made an example illustrating this idea: https://github.com/not-fl3/sokol-zig To make it so close - I hacked on
And this hack worked surprisingly well - all the I do understand that this should not be default behaviour, and that My goal here - to deomnstrate one of the C libraries that would be super nice to have and be easy to use from zig and find how I could help to make it happen. |
A thought: If we translate the snippet extern const char foo[]; Then I think we should assume it is a |
An update after gaining more experience in working with Zig, and using translate-c to write Redis modules. I'll focus my reasoning more on usage patterns and less on the technical details. I'll also use the Redis module use case as my main example because that's the one real-world case that I have experience in, which means what I say might not apply just as well to other uses for As a user interested in writing a Redis module in Zig, importing As a Redis expert, I might be interested in maintaining a wrapper around
At this point users will have to read my documentation to understand the changes that I made from the original API so that they know how to "translate" what they read in the original documentation (eg, function names are the same except for the From this premise, here's a point that might be worth discussing: End users vs wrapper maintainersMaybe we shouldn't consider the end user personalizing the way Zig imports a C header file as the primary use case. In my previous comment, when I was sketching how the API might look like, I imagined that the end user would be writing that code, but now I think this shouldn't be the case.
I think we should consider wrapper maintainers the main user of any such API, which means two things:
Finally, focusing on "wrapper maintaners" also means that we can go nuts with C annotations and magic comments because we don't have to worry anymore about trying to convince upstream to accept changes, or having end users rummage through the headerfile by themselves. |
Clang has Here's the small bit of glue code to get the attribute if anyone wants to work on it: https://gist.github.com/tadeokondrak/514d0486d04a3b74c0e7383966505b1c |
I've made two large Zig -> C bindings now, for Vulkan and Dear ImGui (via cimgui), so I'd like to share the things that I'm considering when preparing a binding in order to better inform this feature. First, I'd prefer not to modify the header file directly. The ability to update the C library in-place and know that I didn't accidentally break anything is really important to me, so I'd prefer to specify any metadata for the generator externally. Second, excluding old and very stable apis, generation isn't only run once. It's run every time the underlying C library updates. So editing the .zig file by hand is a no-go for large apis, because you would need to specify every pointer type again for the entire api when updating. Instead you need a set of rules that the binding generator can use to generate pointer types. These rules will be specific to the api that you are binding, but should be specified in a way that is relatively resilient to updates. For Vulkan this is easy, they provide an xml file that describes nullability and array-ness. For Dear ImGui I made a list of rules that match against hierarchical context information. This way when the library updates I'll be able to update and keep most of the pointer types. Anything that changed in the API I'll be able to inspect in a diff, and add more rules if any incorrect choices were made or if anything was missed. I also usually provide a second-tier api that demangles names, wraps pointer+length into slices, error codes into errors, and flag enums into packed structs of bools, along with occasional custom wrappers to make certain operations easier. These need even more codebase-specific information than the first-tier api, but for large apis that update frequently generation is still easier than writing everything by hand. The ability to override the generation of specific functions and supply your own implementation is important for some cases, especially when wrapping functions that use Maybe generating the second-tier api is out of scope for translate-c. But if translate-c could output an easy-to-parse metadata file containing all of the definitions (or a library ready to import it), it would make custom code to generate the second-tier api much easier to write. |
Would be cool if I could annotate enum as having no aliased enumerators, and such enum would be translated to a zig enum and not a bunch of integer constants |
If upstream C sources are willing to add extra annotations to their header files, Zig can perform better automatic translation. Here are some examples of things that could be annotated:
GCC/Clang have the nonnull attribute which can be applied to function parameters, but not types in general. Certainly Zig should respect this attribute on function parameters, but there is also room for coming up with a way to annotate types as well.
The text was updated successfully, but these errors were encountered: