Skip to content

Proposal: New pointer type for C pointers #1059

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
andrewrk opened this issue Jun 5, 2018 · 16 comments · Fixed by #1965
Closed

Proposal: New pointer type for C pointers #1059

andrewrk opened this issue Jun 5, 2018 · 16 comments · Fixed by #1965
Labels
accepted This proposal is planned. proposal This issue suggests modifications. If it also has the "accepted" label then it is planned.
Milestone

Comments

@andrewrk
Copy link
Member

andrewrk commented Jun 5, 2018

Latest Progress


Given any C function prototype, a human can look at it and properly decide between which Zig pointer type should be used. Is it a * pointer to a single-item? Or is it a [*] pointer to multiple things?

Unfortunately, Zig has no way to automatically distinguish, and when translating from C, it must pick one or the other. To always pick * makes c.printf(c"hello") give a compile error, and to always pick [*] makes c.scanf("%d", &value) give a compile error.

Currently I've chosen [*] and you can see in this commit that updates the Tetris example that I had to make a workaround function c.ptr and use it everywhere we pass the result of address-of to a C function:
andrewrk/tetris@afdc72e

Given that one of Zig's goals is to beat C at its own game - even at interfacing with C code, interop with the result of @cImport should be better than this.

So, just like we have integer types that are only intended to be used for C interop, I propose adding a new pointer with this syntax: [*c]T

This pointer would be the pointer type that translate-c (and therefore @cImport) chooses. It has the same semantics as pointers in C:

  • It can be null. Adding ? in front would be an additional
    null ability on top of the fact that the pointer might == null.
  • It implicitly casts to zig pointers (it can still catch alignment and const/volatile violations)
  • It implicitly casts to [*c]c_void.
  • It supports pointer arithmetic.

In other words, it's a type that should only ever be used when using @cImport.And @cImport should only be used when a .h file is unstable and provided by the system. For many cases, one can use translate-c one time, and then manually improve the API by changing [*c] pointers to (possibly nullable) zig pointers, and then committing the generated and manually adjusted code to the source repository.

Here are a couple simple alternatives to this proposal:

  • Accept that there will be some workarounds for FFI when using @cImport
    • Perhaps we could add @ptrToMany builtin which is considered a safe (but explicit) cast to get from *T to [*]T. For example: @ptrToMany(&variable).
      • Note: This is already possible to implement in userland, albeit with a gnarly implementation, and the userland implementation even works at comptime.
  • Allow implicitly casting *T to [*]T
@andrewrk andrewrk added the proposal This issue suggests modifications. If it also has the "accepted" label then it is planned. label Jun 5, 2018
@andrewrk andrewrk added this to the 0.4.0 milestone Jun 5, 2018
@phase
Copy link
Contributor

phase commented Jun 5, 2018

I like the first alternative you listed, leaving the implementation in userland, because it separates the C interop from the language. Adding another pointer variant for one part of one feature is a lot when it can be implemented with a simple function.

Converting between all the pointers might get confusing too. I like the explicitness of a function call.

@andrewrk
Copy link
Member Author

andrewrk commented Jun 5, 2018

Another thing I just realized is that pointer of unknown length to opaque type does not make sense. And forward-declared structs, which are very common in .h files, are translated as opaque types. So we can translate this case as single-item pointers. That solves a big chunk of the issue here, but it does leave integers and float values requiring the cast.

andrewrk added a commit that referenced this issue Jun 5, 2018
This also means that translate-c has to detect when a pointer to
opaque is happening, and use `*` instead of `[*]`.

See #1059
@Hejsil
Copy link
Contributor

Hejsil commented Jun 15, 2018

What would the pointer turn into when unwrapping the null? @typeOf(c_ptr.?) == ????.
If it turns into [*], then you lose "It implicitly casts to [*c]c_void".

Edit: Also, since [*c]Opaque is a valid C pointer, unwrapping this pointer is invalid because [*]Opaque is invalid

@raulgrell
Copy link
Contributor

raulgrell commented Aug 25, 2018

Implicitly casting *T to ?[*]T would probably solve most of the pain points I've had while using @cImports directly without having to add another pointer type.

There are plans to add a [*]null T... could this type be appropriated for what you're talking about here?

It can be null. Adding ? in front would be an additional nullability on top of the fact that the pointer might == null

are a b and c equivalent?

var a: [*]null u8 = null;
var b: [*]null u8 = "";
var c = [*]null u8 { 0 };

EDIT: Removed noise

@nornagon
Copy link

What's the downside of allowing implicitly casting *T to ?[*]T?

@raulgrell
Copy link
Contributor

It's good to know whether youre passing one item or multiple items... I guess its the same downside as having only one type of pointer.

@andrewrk
Copy link
Member Author

What's the downside of allowing implicitly casting *T to ?[*]T?

That's an important question. If we can't come up with any code examples where this could lead to bugs, then we can just allow this cast and that solves the problem perfectly.

@andrewrk
Copy link
Member Author

Currently I can think of a great example of why this is dangerous: if the [*]T is null terminated, then implicitly casting a *T would compile successfully and then at runtime would be a bug/security vulnerability because the *T isn't null terminated.

After #265 is implemented, this may be less of an issue, but there could be other situations like this.

@BarabasGitHub
Copy link
Contributor

That's a bit of a weird example imho as you have exactly the same problem with any arbitrary [*]T

@thejoshwolfe
Copy link
Contributor

For many cases, one can use translate-c one time, and then manually improve the API by changing [*c] pointers to (possibly nullable) zig pointers, and then committing the generated and manually adjusted code to the source repository.

Is this workflow documented and supported? I'd like to do this for SDL's .h file(s).

I'd also like to consider looking into some kind of userland function that does pattern matching to automate this pointer annotation process. This may play into the code generation ideas i batted around in #383 (comment) .

@andrewrk
Copy link
Member Author

Is this workflow documented and supported?

#1596

I'm not sure what you're asking with regards to support. translate-c is pretty heavily tested and it's the same code that @cImport uses. You end up with .zig code that has extern function declarations, types, constants, and extern variables. If the ABI that it represents is stable then it works great. In fact the workflow is preferred unless the ABI is changing and parsing the .h file is how you discover changes to the ABI, in which case it makes sense to use @cImport instead.

@tiehuis
Copy link
Member

tiehuis commented Sep 27, 2018

@thejoshwolfe I did a translation of SDL2 to Zig a little while ago which may be a useful start for you: https://github.com/tiehuis/zig-sdl2

@andrewrk
Copy link
Member Author

andrewrk commented Nov 21, 2018

I'm going to accept this, for the same reason that we have c_int, c_void, c_char, and @cImport. The point is seamless C interaction, and we simply need this pointer type to seamlessly operate with C libraries. Maybe we can have compile flags that disallow C features such as this pointer type, for codebases that have no reason to use it.

Proposal Edit: nevermind the part about null. That's just confusing. Even the C pointers will not be allowed to be null, and ? interacts with C pointers in the same way as normal pointers.

@thejoshwolfe
Copy link
Contributor

Inspired by #1831, what is the benefit of having ?*T be a special case of optional types outside of C compatibility? It's definitely valuable for C compatibility, but what if ?[*c]T was the only special case for optional types?

(To give background, the special case is that @sizeOf(?*T) == @sizeOf(*T) (as long as @sizeOf(T) > 0) which is different from the usual semantics where @sizeOf(?T) > @sizeOf(T). Furthermore, @intToPtr(*u8, 0) violates language-level assumptions that non-optional pointers must never have an address of 0, because address 0 is the special value for null optional pointers.)

If you're writing embedded code, and you actually have an object at address 0, Zig's assumptions would be violated by taking a pointer to that object. I looked into whether C has this problem or not, and it's one of those "technically x, but in practice y" complicated situations. And "does C have this problem?" is different from "does LLVM have this problem?", and I don't know the answer to the latter.

C's concept of "null pointers" is important for compatibility with C, but why does Zig have a special case concept of "null pointers" that are different from null everything else in Zig?

The more I read about null pointers in C, the more horrified I am by the semantics. For example, a C null pointer is not necessarily represented by the value zero, which means that technically sometimes doing a pointer<->integer conversion in C requires branching logic to special case null pointers maybe sometimes maybe. I'm starting to think Zig should have a special semantics for C null pointers to insulate the rest of the Zig language from this nonsense. This is another one of those "technically x, but in practice y" situations, but it might be a good idea to protect ourselves early on.

@kyle-github
Copy link

As @thejoshwolfe notes, the actual value of NULL is kind of a nightmare. What are the use cases within Zig for translated C pointers? It seems like you are going to need to know that there is a special value, C's NULL or C++'s nullptr, and deal with it.

How painful is it to write wrappers to deal with this case? I.e. have code that checks the value of the C pointer:

if myCPtr == C_NULL { ... }

Where C_NULL is a special value that is specific to the target platform (LLVM has to deal with this today due to the C/C++ spec so it will have this value). 99% of the time it happens to be zero, but it might not. As @thejoshwolfe notes, embedded programming throws all the assumptions about addresses out the window.

If the use case is to make Zig's null handling interoperate with C data, then this would not work, but I question whether that is going to be a large part of the code.

@andrewrk
Copy link
Member Author

andrewrk commented Feb 10, 2019

  • add tokenizer/parser code
  • docs update
  • allow implicit int-to-ptr
  • allow comparison with integers
  • allow implicit casting between it and any other pointer len (single-item / unknown length)
  • allow implicit casting to [*c]c_void
  • revisit implicit cast from *T to ?*c_void. maybe it shouldn't be allowed anymore
  • allow pointer arithmetic
  • make translate-c choose this type
  • update zig fmt and peg grammar
  • compile error test for integer value %s outside of pointer address range
  • disallow [*c]T where T is a type not allowed in extern functions
  • compile error test for casting integer to c pointer when the int has more bits than pointers
  • safety check when implicitly casting c pointer to non-c pointer, to make sure it's not null
  • compile error for implicitly between c pointer and non-c pointers, make sure const/volatile are respected
  • compile error for implicitly between c pointer and non-c pointers, make sure alignment is respected
  • update @typeInfo
  • peer type resolution: any other pointer type and [*c]T turns into [*c]T.
  • compile error test for compile time pointer arithmetic overflowing runtime safety checking for pointer arithmetic #1918
  • runtime safety test for pointer arithmetic overflowing runtime safety checking for pointer arithmetic #1918
  • comptime pointer arithmetic
  • compile error test for comptime ptr arithmetic with undefined runtime safety checking for pointer arithmetic #1918
  • implicit casting between C pointer and non-C pointer when the element is itself a C pointer vs non-C pointer
  • fix the nonnull llvm attribute
  • disallow pointer attributes on C pointers

andrewrk added a commit that referenced this issue Feb 10, 2019
andrewrk added a commit that referenced this issue Feb 11, 2019
andrewrk added a commit that referenced this issue Feb 11, 2019
andrewrk added a commit that referenced this issue Feb 11, 2019
andrewrk added a commit that referenced this issue Feb 11, 2019
andrewrk added a commit that referenced this issue Feb 12, 2019
andrewrk added a commit that referenced this issue Feb 12, 2019
when the int has more bits than pointers

See #1059
andrewrk added a commit that referenced this issue Feb 14, 2019
and update self hosted compiler for C pointers

See #1059
andrewrk added a commit that referenced this issue Feb 15, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accepted This proposal is planned. proposal This issue suggests modifications. If it also has the "accepted" label then it is planned.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants