Skip to content

Commit 13a2bda

Browse files
committed
Added debugging doco on structural comparing and adding missing grammare productions / lexemes.
[git-p4: depot-paths = "//src/ruby_parser/dev/": change = 13009]
1 parent 0251dc9 commit 13a2bda

File tree

1 file changed

+133
-0
lines changed

1 file changed

+133
-0
lines changed

debugging.md

Lines changed: 133 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -55,3 +55,136 @@ From there? Good luck. I'm currently trying to backtrack from rule
5555
reductions to state change differences. I'd like to figure out a way
5656
to go from this sort of diff to a reasonable test that checks state
5757
changes but I don't have that set up at this point.
58+
59+
## Adding New Grammar Productions
60+
61+
Ruby adds stuff to the parser ALL THE TIME. It's actually hard to keep
62+
up with, but I've added some tools and shown what a typical workflow
63+
looks like. Let's say you want to add ruby 2.7's "beginless range" (eg
64+
`..42`).
65+
66+
Whenever there's a language feature missing, I start with comparing
67+
the parse trees between MRI and RP:
68+
69+
### Structural Comparing
70+
71+
There's a bunch of rake tasks `compare27`, `compare26`, etc that try
72+
to normalize and diff MRI's parse.y parse tree (just the structure of
73+
the tree in yacc) to ruby\_parser's parse tree (racc). It's the first
74+
thing I do when I'm adding a new version. Stub out all the version
75+
differences, and then start to diff the structure and move
76+
ruby\_parser towards the new changes.
77+
78+
Some differences are just gonna be there... but here's an example of a
79+
real diff between MRI 2.7 and ruby_parser as of today:
80+
81+
```diff
82+
arg tDOT3 arg
83+
arg tDOT2
84+
arg tDOT3
85+
- tBDOT2 arg
86+
- tBDOT3 arg
87+
arg tPLUS arg
88+
arg tMINUS arg
89+
arg tSTAR2 arg
90+
```
91+
92+
This is a new language feature that ruby_parser doesn't handle yet.
93+
It's in MRI (the left hand side of the diff) but not ruby\_parser (the
94+
right hand side) so it is a `-` or missing line.
95+
96+
Some other diffs will have both `+` and `-` lines. That usually
97+
happens when MRI has been refactoring the grammar. Sometimes I choose
98+
to adapt those refactorings and sometimes it starts to get too
99+
difficult to maintain multiple versions of ruby parsing in a single
100+
file.
101+
102+
But! This structural comparing is always a place you should look when
103+
ruby_parser is failing to parse something. Maybe it just hasn't been
104+
implemented yet and the easiest place to look is the diff.
105+
106+
### Starting Test First
107+
108+
The next thing I do is to add a parser test to cover that feature. I
109+
usually start with the parser and work backwards towards the lexer as
110+
needed, as I find it structures things properly and keeps things goal
111+
oriented.
112+
113+
So, make a new parser test, usually in the versioned section of the
114+
parser tests.
115+
116+
```
117+
def test_beginless2
118+
rb = "..10\n; ..a\n; c"
119+
pt = s(:block,
120+
s(:dot2, nil, s(:lit, 0).line(1)).line(1),
121+
s(:dot2, nil, s(:call, nil, :a).line(2)).line(2),
122+
s(:call, nil, :c).line(3)).line(1)
123+
124+
assert_parse_line rb, pt, 1
125+
126+
flunk "not done yet"
127+
end
128+
```
129+
130+
(In this case copied and modified the tests for open ranges from 2.6)
131+
and run it to get my first error:
132+
133+
```
134+
% rake N=/beginless/
135+
136+
...
137+
138+
E
139+
140+
Finished in 0.021814s, 45.8421 runs/s, 0.0000 assertions/s.
141+
142+
1) Error:
143+
TestRubyParserV27#test_whatevs:
144+
Racc::ParseError: (string):1 :: parse error on value ".." (tDOT2)
145+
GEMS/2.7.0/gems/racc-1.5.0/lib/racc/parser.rb:538:in `on_error'
146+
WORK/ruby_parser/dev/lib/ruby_parser_extras.rb:1304:in `on_error'
147+
(eval):3:in `_racc_do_parse_c'
148+
(eval):3:in `do_parse'
149+
WORK/ruby_parser/dev/lib/ruby_parser_extras.rb:1329:in `block in process'
150+
RUBY/lib/ruby/2.7.0/timeout.rb:95:in `block in timeout'
151+
RUBY/lib/ruby/2.7.0/timeout.rb:33:in `block in catch'
152+
RUBY/lib/ruby/2.7.0/timeout.rb:33:in `catch'
153+
RUBY/lib/ruby/2.7.0/timeout.rb:33:in `catch'
154+
RUBY/lib/ruby/2.7.0/timeout.rb:110:in `timeout'
155+
WORK/ruby_parser/dev/lib/ruby_parser_extras.rb:1317:in `process'
156+
WORK/ruby_parser/dev/test/test_ruby_parser.rb:4198:in `assert_parse'
157+
WORK/ruby_parser/dev/test/test_ruby_parser.rb:4221:in `assert_parse_line'
158+
WORK/ruby_parser/dev/test/test_ruby_parser.rb:4451:in `test_whatevs'
159+
```
160+
161+
For starters, we know the missing production is for `tBDOT2 arg`. It
162+
is currently blowing up because it is getting `tDOT2` and simply
163+
doesn't know what to do with it, so it raises the error. As the diff
164+
suggests, that's the wrong token to begin with, so it is probably time
165+
to also create a lexer test:
166+
167+
```
168+
def test_yylex_bdot2
169+
assert_lex3("..42",
170+
s(:dot2, nil, s(:lit, 42)),
171+
172+
:tBDOT2, "..", EXPR_BEG,
173+
:tINTEGER, "42", EXPR_NUM)
174+
175+
flunk "not done yet"
176+
end
177+
```
178+
179+
This one is mostly speculative at this point. It says "if we're lexing
180+
this string, we should get this sexp if we fully parse it, and the
181+
lexical stream should look like this"... That last bit is mostly made
182+
up at this point. Sometimes I don't know exactly what expression state
183+
things should be in until I start really digging in.
184+
185+
At this point, I have 2 failing tests that are directing me in the
186+
right direction. It's now a matter of digging through
187+
`compare/parse26.y` to see how the lexer differs and implementing
188+
it...
189+
190+
But this is a good start to the doco for now. I'll add more later.

0 commit comments

Comments
 (0)