Skip to content

remove scanner, rewrite grammar.js #16

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Sep 28, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
54 changes: 54 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
tree-sitter-vimdoc
==================

This grammar intentionally support a subset of the vimdoc "spec"; predictable
results are the primary goal, so that _output_ formats (e.g. HTML) are
well-formed; the _input_ (vimdoc) is secondary. The first step should always be
to try to fix the input (within reason) rather than insist on a grammar that
handles vimdoc's endless quirks.

Notes
-----

- vimdoc format "spec":
- [:help help-writing](https://neovim.io/doc/user/helphelp.html#help-writing)
- https://github.com/nanotee/vimdoc-notes
- whitespace is intentionally captured in `(word)`, because it is often necessary to be
able to correctly layout vim help files (especially old/legacy).
- `(codeblock)` is contained by `(line)` because `>` can start a code block at the end of a line.
- `(column_heading)` is contained by `(line)` because `>` (to close
a `(codeblock)` can appear at the start of `(column_heading)`.
- `h1` ("Heading 1"): `======` followed by text and optional `*tags*`.
- `h2` ("Heading 2"): `------` followed by text and optional `*tags*`.
- `h3` ("Heading 3"): only UPPERCASE WORDS, followed by optional `*tags*`.

Known issues
------------

- `line_li` ("list item") is _experimental_. It doesn't support nesting yet and
it may not work well; you can treat it as a normal `line` for layout purposes.
- `codeblock` ">" must not be preceded only by tabs, a space char is required (" >").
See `:help lcs-tab` for example. Currently the grammar doesn't enforce this.
- `codeblock` terminated by an "implicit stop" (i.e. no terminating `<`)
consumes the first char of the terminating line, and continues the parent
`block`, preventing top-level forms like `h1`, `h2` from being recognized
until a blank line is encountered.
- `line` in a `codeblock` does not contain `word` atoms, it's just the full
raw text line including whitespace. This is somewhat dictated by its
"preformatted" nature; parsing the contents would require loading a "child"
language (injection). See [#2](https://github.com/vigoux/tree-sitter-vimdoc/issues/2).
- `url` doesn't handle _surrounding_ parens. E.g. `(https://example.com/#yay)` yields `word`
- `url` doesn't handle _nested_ parens. E.g. `(https://example.com/(foo)#yay)`
- Ideally `block_end` should consume the last block of the document _only_ if that
block is missing a trailing blank line or EOL ("\n").
- TODO: consider simply _not supporting_ docs without EOL?
- Ideally `line_noeol` should consume the last line of the document _only_ if
that line is missing EOL ("\n").
- TODO: consider simply _not supporting_ docs without EOL?

TODO
----

- `line_noeol` is a special-case to support documents that don't end in EOL.
Grammar could be a bit simpler if we just require EOL at end of document.
- `line_modeline` (only at EOF)
54 changes: 38 additions & 16 deletions corpus/arguments.txt
Original file line number Diff line number Diff line change
@@ -1,31 +1,53 @@
================================================================================
Simple argument
simple argument
================================================================================
This in an argument: {arg}
--------------------------------------------------------------------------------

(help_file
(line
(word)
(word)
(word)
(word)
(argument
(word))))
(block
(line
(word)
(word)
(word)
(word)
(argument
(word)))))

================================================================================
Multiple arguments on the same line
multiple arguments on the same line
================================================================================

{foo} {bar} {baz}

--------------------------------------------------------------------------------

(help_file
(line
(argument
(word))
(argument
(word))
(argument
(block
(line
(argument
(word))
(argument
(word))
(argument
(word)))))

================================================================================
NOT an argument
================================================================================
{foo "{bar}" `{baz}` |{baz| } {}

--------------------------------------------------------------------------------

(help_file
(block
(line
(argument
(word)
(ERROR))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I gave up on this for now, ideally it should be:

  (argument
    (word)
    (MISSING "}"))

(word)
(codespan
(word))
(taglink
(word))
(word)
(word))))
45 changes: 0 additions & 45 deletions corpus/backtick.txt

This file was deleted.

96 changes: 0 additions & 96 deletions corpus/code_block.txt

This file was deleted.

Loading