@@ -49,8 +49,8 @@ to avoid this problem.
49
49
# Detailed design
50
50
[ design ] : #detailed-design
51
51
52
- There are two kinds of procedural macro: function-like and macro -like. These two
53
- kinds exist today, and other than naming (see
52
+ There are two kinds of procedural macro: function-like and attribute -like. These
53
+ two kinds exist today, and other than naming (see
54
54
[ RFC 1561] ( https://github.com/rust-lang/rfcs/pull/1561 ) ) the syntax for using
55
55
these macros remains unchanged. If the macro is called ` foo ` , then a function-
56
56
like macro is used with syntax ` foo!(...) ` , and an attribute-like macro with
@@ -120,8 +120,9 @@ details.
120
120
121
121
When a ` #[cfg(macro)] ` crate is ` extern crate ` ed, it's items (even public ones)
122
122
are not available to the importing crate; only macros declared in that crate.
123
- The crate is dynamically linked with the compiler at compile-time, rather
124
- than with the importing crate at runtime.
123
+ There should be a lint to warn about public items which will not be visible due
124
+ to ` #[cfg(macro)] ` . The crate is dynamically linked with the compiler at
125
+ compile-time, rather than with the importing crate at runtime.
125
126
126
127
127
128
## Writing procedural macros
@@ -163,7 +164,7 @@ sketch is available in this [blog post](http://ncameron.org/blog/libmacro/).
163
164
## Tokens
164
165
165
166
Procedural macros will primarily operate on tokens. There are two main benefits
166
- to this principal : flexibility and future proofing. By operating on tokens, code
167
+ to this principle : flexibility and future proofing. By operating on tokens, code
167
168
passed to procedural macros does not need to satisfy the Rust parser, only the
168
169
lexer. Stabilising an interface based on tokens means we need only commit to
169
170
not changing the rules around those tokens, not the whole grammar. I.e., it
@@ -213,12 +214,20 @@ pub struct TokenTree {
213
214
}
214
215
215
216
pub enum TokenKind {
216
- Sequence(Delimiter, Vec<TokenTree> ),
217
+ Sequence(Delimiter, TokenStream ),
217
218
218
219
// The content of the comment can be found from the span.
219
220
Comment(CommentKind),
220
- // The Span is the span of the string itself, without delimiters.
221
- String(Span, StringKind),
221
+
222
+ // Symbol is the string contents, not including delimiters. It would be nice
223
+ // to avoid an allocation in the common case that the string is in the
224
+ // source code. We might be able to use `&'Codemap str` or something.
225
+ // `Option<usize> is for the count of `#`s if the string is a raw string. If
226
+ // the string is not raw, then it will be `None`.
227
+ String(Symbol, Option<usize>, StringKind),
228
+
229
+ // char literal, span includes the `'` delimiters.
230
+ Char(char),
222
231
223
232
// These tokens are treated specially since they are used for macro
224
233
// expansion or delimiting items.
@@ -227,11 +236,11 @@ pub enum TokenKind {
227
236
// Not actually sure if we need this or if semicolons can be treated like
228
237
// other punctuation.
229
238
Semicolon, // `;`
230
- Eof,
239
+ Eof, // Do we need this?
231
240
232
241
// Word is defined by Unicode Standard Annex 31 -
233
242
// [Unicode Identifier and Pattern Syntax](http://unicode.org/reports/tr31/)
234
- Word(InternedString ),
243
+ Word(Symbol ),
235
244
Punctuation(char),
236
245
}
237
246
@@ -253,13 +262,34 @@ pub enum CommentKind {
253
262
254
263
pub enum StringKind {
255
264
Regular,
256
- // usize is for the count of `#`s.
257
- Raw(usize),
258
265
Byte,
259
- RawByte(usize),
260
266
}
267
+
268
+ // A Symbol is a possibly-interned string.
269
+ pub struct Symbol { ... }
261
270
```
262
271
272
+ ### Open question: ` Punctuation(char) ` and multi-char operators.
273
+
274
+ Rust has many compound operators, e.g., ` << ` . It's not clear how best to deal
275
+ with them. If the source code contains "` + = ` ", it would be nice to distinguish
276
+ this in the token stream from "` += ` ". On the other hand, if we represent ` << ` as
277
+ a single token, then the macro may need to split them into ` < ` , ` < ` in generic
278
+ position.
279
+
280
+ I had hoped to represent each character as a separate token. However, to make
281
+ pattern matching backwards compatible, we would need to combine some tokens. In
282
+ fact, if we want to be completely backwards compatible, we probably need to keep
283
+ the same set of compound operators as are defined at the moment.
284
+
285
+ Some solutions:
286
+
287
+ * ` Punctuation(char) ` with special rules for pattern matching tokens,
288
+ * ` Punctuation([char]) ` with a facility for macros to split tokens. Tokenising
289
+ could match the maximum number of punctuation characters, or use the rules for
290
+ the current token set. The former would have issues with pattern matching. The
291
+ latter is a bit hacky, there would be backwards compatibility issues if we
292
+ wanted to add new compound operators in the future.
263
293
264
294
## Staging
265
295
@@ -314,6 +344,9 @@ are better addressed by compiler plug-ins or tools based on the compiler (the
314
344
latter can be written today, the former require more work on an interface to the
315
345
compiler to be practical).
316
346
347
+ We could use the ` macro ` keyword rather than the ` fn ` keyword to declare a
348
+ macro. We would then not require a ` #[macro] ` attribute.
349
+
317
350
We could have a dedicated syntax for procedural macros, similar to the
318
351
` macro_rules ` syntax for macros by example. Since a procedural macro is really
319
352
just a Rust function, I believe using a function is better. I have also not been
@@ -374,6 +407,8 @@ a process-separated model (if desired). However, if this is considered an
374
407
essential feature of macro reform, then we might want to consider the interfaces
375
408
more thoroughly with this in mind.
376
409
410
+ A step in this direction might be to run the macro in its own thread, but in the
411
+ compiler's process.
377
412
378
413
### Interactions with constant evaluation
379
414
0 commit comments