Skip to content

Proposal: Comptime Labels (tags) #3142

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ghost opened this issue Aug 30, 2019 · 3 comments
Closed

Proposal: Comptime Labels (tags) #3142

ghost opened this issue Aug 30, 2019 · 3 comments
Labels
proposal This issue suggests modifications. If it also has the "accepted" label then it is planned.
Milestone

Comments

@ghost
Copy link

ghost commented Aug 30, 2019

Proposal: Comptime Labels (tags)

Basically, this proposal originates as a more generic, universal and concrete version of the "distinct types" #1595 proposal, but it is also related to the "tags" #1099 proposal, since the idea here is essentially about adding comptime "metadata" to code.

Main benefit of the proposal is adding more type safety to primitive values like floats, ints and booleans.

Secondary benefit is that by making the labeling syntax generic and applicable "everywhere", it becomes both simpler and more versatile, possibly serving as a platform to implement also other comptime check related features mostly in userland. After all, sometimes features with very restriced applicability can be more confusing than generic and consistent features.

The syntax change of this proposal introduces only the [: token, and otherwise looks much like array indexing. Short preview of syntax:

a[:label1];       // attach "label1" to variable "a"
b : f64[:label2]; // expect variable "b" to be of type f64 with label "label2" attached;

fn useLabeledBooleans(reset: bool[:reset], dstart: bool[:delayedStart], dstop: bool[:delayedStop]) void{
//...
}

useLabeledBooleans(true[:reset], false[:delayedStart], false[:delayedStop]);
//using wrong label would be caught during compilation

The syntax is verbose, but in turn it hides nothing of the annotated/labeled code from the reader.

More examples in this gist.

Premises

Two premises of this proposal are:

Labels are there ONLY to make the compiler do additional sanity checks on code.

If you remove labels from your source, the EXACT same runtime assembly code is generated

These premises limit some of the imagined uses of labels, but in return these premises makes labels very easy to grasp mentally, which might be a good trade off. It also means that comptime labels should not impact runtime performance at all.

This would also open up the possibility of implementing labels in stage2 only. If the stage1 compiler sees a label, it can give an error and request the "don't process labels" compiler flag to be set.

Not sticking to these premises could also be considered if there would be any benefits in that, e.g if a sufficient number of keywords could be made redundant with labels.

Compiler utilization of comptime labels

With labels, the compiler can ask additional questions about code:

  • Given an operator, the operands and their labels ..determine if this expression is valid as per the labels. E.g, can enforce that it's not possible to add meters to seconds, even if they are both represented with a f64. Also, if bitwise OR on apples doesn't make sense, disallowing bitwise OR on integers representing apples is possible with labels.

  • Given a function call and the the arguments ..check whether the arguments are labeled in accordance with the function definition. With labels, you cannot pass an integer representing a bitflag to a function expecting an integer that represents a database ID .

  • Given an instance with labels for the different states it can be in ..check whether the state of the instance is appropriate for a given function call. This basically means the programmer is annotating state "manually" with labels, but the compiler can perform sanity checks using labels, whereas this is not possible with plain old code comments.

  • In short ..for every assignment, operator expression and function call, check label compatability (e.g between operands) and if there are any restrictions that the labels impose.

Implementation from a user perspective

With this proposal, new syntax must be introduced for applying the comptime labels within code, but configuring and creating custom labels requires no new syntax at all.

  • LabelTemplate ..built-in (standardized) interface, or convention on struct definitions, that the compiler knows how to translate into comptime checks. (Implements the callbacks mentioned above). Can have multiple LabelTemplates for different use-cases, some simple and user friendly, some more advanced and powerful.

  • LabelGroup ..struct def that follows/implements a LabelTemplate. These can be user defined or reside in a library.

  • labelInstance ..user or library defined instance of a LabelGroup struct. Simply a normal struct instance just like any other. Simplest form of a labelInstance is just a wrapped enum value, but any metadata is possible.

  • a[:labelInstance] ..read as, variable "a" is labeled with "labelInstance", instance of a LabelGroup (that conforms to a LabelTemplate).

  • a : f64[:labelInstance] ..read as, expect variable "a" to be of type f64 but also labeled with "labelInstance".

Outline of compiler "check labels" procedure

  • verify syntax, no "label brackets" ( [: tokens) in places where they don't belong

  • verify that all identifiers inside label brackets are valid labels. (instances comply to some predefined template)

  • verify that no labels are applied to a type they do not accept. (labels must directly or indirectly through a Label Template define a fn typeIsOk(comptime T : type) callback)

  • then, for all operations [note 1]:

    • verify that all operand associated labels are of the same group (instances of same struct)

      • unlabeled operands are treated as if they belong to some fictious "unlabeled" group
      • if one of the operands has the "infer me" label, it'll receive the same label as the other operand has
      • raise error if there's a label group conflict
    • verify with the "label group" that this combination of label instances in the given operation is allowed, and find out what label the operation result should have (Label groups/structs must define a fn operationIsOk(comptime op : ZigOperationEnum, orderedLabels : []@This()) @This() callback)

These callback functions could be implemented in userland by anoyone wishing to do a specific comptime checks utilizing labels. Though in most cases, more user friendly LabelTemplates would create wrappers around the callbacks, providing type checking "presets" to be further tailored by end users.

[note 1]: With operations, I mean unary and binary operators, function call parameter passing, assignments, "address of", indexing etc.. The more operations that are included, the more powerful typechecking with labels could become.

Programmer benefit

Simple and consistent syntax, where end user usage and configuration relies only on very fundamental zig features (structs and instances).

Just a few LabelTemplates defined in the compiler would be enough to enable custom comptime checks suitable for bitflags, physical units, currency, state annotations, ...

More robust refactoring even when relying on primitive and performant types. Compiler will alert you if you assume a "speed" f64 variable to be representing mph, when elsewhere in code it represents m/s.

Other considerations

  • [:_] ..syntax to infer labels.
// inferring labels with [:_] ...
const a : f64[:_] = 25.0[:labelInstance];
const b: f64[:labelInstance] = 12.0[:_];

// perhaps out of scope:
const areaOfCircle : _[:mathFunc] = fn(r: f64[:radius]) f64[:area]{ ... }[:_]
// currently "_" is not allowed as a placeholder for the type in a variable declaration, 
// but perhaps labeling structs or functions is not useful anyhow.
  • labelVar[:] ..becomes unlabeled: "[:], empty label" syntax to strip labels of a variable.

  • f64[:labelInstance] "subtypes" f64: ..any function or assignment that expects a non-labeled type will accept a label variable of the correct type, but the label will be discarded.

  • f64 does NOT "subtype" f64[:labelInstance]: ..stronger type checks with labels. Cannot pass a non-labeled variable to somewhere a labeled variable is expected, even if the base type matches.

  • labeledVariable[:newLabel] fails: ..cannot "relabel" variables. Must first strip existing label by assigning to a temp unlabeled variable, or by using some "strip label" syntax, e.g (labeledWithLabel1[:])[:newLabel]) or labeledWithLabel1[:][:newLabel]

  • Why [:...] syntax?: ..because this closely resembles unit notation in physics calculations, does not conflict with any existing syntax, and does not demand introduction of any sigils not already in use.

  • Multiple labels? ..one possible syntax is [:label1,label2] becomes "apply label1 AND label2", [:label1:label2] becomes "expect label1 OR label2 applied", [:(label1:label2),label3] becomes "expect label3 AND (label1 OR label2) applied

  • Handling of arrays: ..should it be possible to label the elements of an array only, or to label the array itself? If both should be allowed, what should the syntax difference be?

// difference between labeling an array or all elements of the array?
const arr : [10]const u8[:ascii] = undefined;
const arr2 : ([10]const u8)[:unicode] = undefined;

LabelTemplate

LabelTemplates could be library/userland provided wrappers around the callback functions mentioned in the "outline of procedure" part above.

Examples:

  • SimpleGroup ..This LabelTemplate represents a demand for equal labels (equal instance) in assignments (lval and rval) and function calls (passed variable and function parameter). Implemented by wrapping an enum. Conforming LabelGroups must embed an enum, and labelInstances wrap a concrete enum value. Example: "PrimaryColor" LabelGroup that embeds const E = enum{red,green,blue}, yields three possible labelInstances, wrapping either E.red, E.green or E.blue. Operators like == or + simply remove the label and then return an unlabeled result if the operands are labeled.

  • CompoundMeasure ..this labeltemplate lets you add "unit" or "measure" metadata to primitives numbers, either integers or floats. Can be defined by forcing all LabelGroups implementing CompoundMeasure to have an exponent array of integers, where each entry represents the power of a unit. 4[m/s] => 4 [m=1,s=-1] => 4 [1,-1]. This exponent array is used by the compiler to determine whether operators or assignments are allowed. Two float operands labeled with the same exponent array can be added or subtracted from each other, for example.

  • NumberGroup ..this LabelTemplate is similar to SimpleGroup above, but with type checking enabled for operators as well. E.g a[:label1] * b[:label1] is allowed, a[:label1] * b[:label2] is not

Both NumberGroup and CompoundMeasure could also have options to tell compiler which operators are enabled or disabled. This would allow the creation of a "bitflag" number group that only allows
equality checks, bitwise operations, and perhaps left/right shift operations. This would make it a compile error to multiply bitflags.

  • IdGroup ..This LabelTemplate is like SimpleGroup, demanding equal labels for some operations, but also has an ID field (e.g u64) so that each label can have an unique ID if required. This might possibly be leveraged by tools (IDEs)

  • TypeRestrictGroup .. This LabelTemplate defines assertions that can be applied to types, as types are comptime known in zig. Would allow a form of comptime interfaces or traits to be implemented.

Hypothetical uses:

@andrewrk andrewrk added the proposal This issue suggests modifications. If it also has the "accepted" label then it is planned. label Aug 30, 2019
@andrewrk andrewrk added this to the 0.6.0 milestone Aug 30, 2019
@mikdusan
Copy link
Member

mikdusan commented Sep 2, 2019

I wonder if this would be PR syntax in consideration of utf8 string topic from irc:

const utf8 = u8[:utf8];

pub fn trim(slice: []const utf8, values_to_strip: []const utf8) []const utf8 { ... }

@ghost
Copy link
Author

ghost commented Sep 2, 2019

I wonder if this would be PR syntax in consideration of utf8 string topic from irc:

const utf8 = u8[:utf8];

pub fn trim(slice: []const utf8, values_to_strip: []const utf8) []const utf8 { ... }

UTF8 is meaningful only in the context of a byte array, not so much a single byte? Sadly the syntax I've been considering doesn't look too good when it comes to labeling whole arrays and not just the elements. The parenthesis becomes necessary.

// explicit, not allowing aliased types to "embed" labels
// quite verbose
pub fn trim(slice : ([]const u8)[:utf8] , values_to_strip : ([]const u8)[:utf8]) ) ([]const u8)[:utf8] { ... }

// allowing creating new "labeled" types, less verbose, but also less explicit
`const utf8bytes = ([]const u8)[:utf8]
pub fn trim(slice : utf8bytes, values_to_strip: utf8bytes) utf8bytes { ... }

Note, using the implementation idea above, utf8 would be an instance of a custom LabelGroup struct implementing LabelTemplate "SimpleGroup", as you only need the labels to differentiate byte arrays representing different things.

It is a bit interesting to consider whether "new types" could be created just from aliasing an existing type and adding a label. I didn't consider it at first, but maybe it wouldn't really break the premise of being able to remove all labels from code without any change in runtime behavior. It does certainly remove the verbosity you'd have otherwise.

@ghost ghost mentioned this issue Jan 26, 2020
@andrewrk andrewrk modified the milestones: 0.6.0, 0.7.0 Feb 11, 2020
@ghost
Copy link
Author

ghost commented Apr 22, 2020

Closing in favor of #5132 , which touches on the same topics but is more aligned with the existing zig type system.

@ghost ghost closed this as completed Apr 22, 2020
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
proposal This issue suggests modifications. If it also has the "accepted" label then it is planned.
Projects
None yet
Development

No branches or pull requests

2 participants