Word completion #13206

the-mikedavis · 2025-03-27T14:35:52Z

Completion of words from open buffers is a feature I miss from Kakoune. The spirit of this implementation is the similar: words are extracted from open buffers and stored in a database (WordDB in Kakoune). Then completion reads from the database to fuzzy-find candidates. All reading and writing to the index is done off the main thread and large changes to the index are debounced. Updates to the index examine the ChangeSet which caused the change and only consider windows of the text around each Operation::Insert and Operation::Delete.

Note that this doesn't follow Pascal's idea for using Aho-Corasick on the current buffer (comment) because I would like words from all documents. Word completion is especially useful when taking notes - for example one window has code I'm reading and the other window has a markdown file for notes - and being able to quickly complete words from the other buffer(s) is really useful for that workflow.

This PR also adds a custom small string type based on what I use in spellbook. I like it because it's simple and specialized for the use case (so, very compact). I'm not very attached to it though and we could slot in some other type. I think we already have a transitive dependency on KString, for one option.

Closes #1063

jerabaul29 · 2025-03-27T15:50:35Z

This would be super nice. And it is also very useful for people who do not have all their LSPs installed and do not have LSP completion available, as this will help get for example complex variable names right :) .

Will this be case sensitive LikeClassNames / include snake_names? :)

the-mikedavis · 2025-03-27T16:27:09Z

It's also useful in my workflow in multi-language repos: some files might be written in C and others in another language. LSP doesn't cover that case very well.

The set of words WordIndex contains the words as they were written in each document but WordIndex::matches uses the default fuzzy matching behavior we use everywhere else, so you can get case-insensitive completions like TSParseOption when typing tsparse for example.

helix-term/src/handlers/completion/word.rs

helix-stdx/src/lib.rs

book/src/editor.md

helix-view/src/handlers/word_index.rs

`TinyBoxedStr` is a small-string optimized replacement for `Box<str>` styled after <https://cedardb.com/blog/german_strings/>. A nearly identical type is used in helix-editor/spellbook for space savings. This type is specialized for its use-case: * strings are immutable after creation * strings are very very small (less than 256 bytes long) Because of these attributes we can use nearly the full size of the type for the inline representation and also keep a small total size. Many other small string crates in the wild are 3 `usize`s long (same as a regular `String`) to support mutability. This type is more like a `Box<str>` (2 `usize`s long). Mostly I like this small string type though because it's very straightforward to implement. Other than a few functions that reach into `std::alloc` or `std::ptr`, the code is short and boring.

RoloEdits · 2025-04-01T01:11:30Z

book/src/editor.md

+| `word-completion` | Enable completion of words from open buffers. | `true` |
+| `word-completion-trigger-length` | Minimum number of characters required to automatically trigger word completion. | `7` |


I wonder if this could be encapsulated as

[editor.word-completion] enable = true trigger-length = 7

darshanCommits · 2025-04-02T04:57:54Z

I think word completions shouldn't appear if same item is provided by other superior source.
or should this be delegated to another PR to filter out duplicate entries in completion entirely.

edit: cant seem to upload picture?

helix-term/src/handlers/completion/word.rs

gabydd · 2025-04-03T00:50:53Z

The same problem happens with path completions so I think that should be a seperate pr so we can figure out the best way to deduplicate completions

SeSodesa · 2025-04-03T07:53:04Z

I wonder if this same mechanism could (later) be used for Unicode completion using Typst's Codex: https://github.com/typst/codex. In addition to looking for words from the open buffers, they would also be searched from the "dictionary" provided by Codex. Related to #1438.

SeSodesa · 2025-04-04T11:56:38Z

I wonder if this same mechanism could (later) be used for Unicode completion using Typst's Codex: https://github.com/typst/codex. In addition to looking for words from the open buffers, they would also be searched from the "dictionary" provided by Codex. Related to #1438.

Of course a problem might occur, if one actually wants to write alpha and not α. Maybe the word completion could then offer both variants, or alternatively one might make the Unicode completion intention explicit by starting a word with a modifier such as # (inspired by the fact that in Typst the code mode initiator is a #). So you would write #alpha to get an autocompletion with a α as its output. The Unicode completion modifier could also be something like u+, instead of #, if something less tied to Typst were desirable.

RoloEdits · 2025-04-04T22:11:43Z

Just want to say that this is such a nice QoL improvement! Something you really didn't know how much you missed until you had it again after a long absence. In testing, I did feel like 5 was a much better target for trigger length. I even turn to 2-3 when I am writing documentation, it really keeps the flow going with quick action/responses.

poliorcetics · 2025-04-06T11:12:39Z

book/src/languages.md

@@ -71,6 +71,7 @@ These configuration keys are available:
 | `text-width`          |  Maximum line length. Used for the `:reflow` command and soft-wrapping if `soft-wrap.wrap-at-text-width` is set, defaults to `editor.text-width`   |
 | `rulers`              | Overrides the `editor.rulers` config key for the language. |
 | `path-completion`     | Overrides the `editor.path-completion` config key for the language. |
+| `word-completion`     | Overrides the `editor.word-completion` config key for the language. |


missing the override of word-completion-trigger-length (and should probably include @RoloEdits suggestion of merging under [word-completion] enable; trigger-length;

poliorcetics · 2025-04-06T11:36:15Z

helix-stdx/src/str.rs

+pub struct TinyBoxedStr {
+    len: u8,
+    prefix: [u8; Self::PREFIX_LEN],
+    trailing: TinyBoxedStrTrailing,
+}


Two usizes is 16 bytes.

This works on average (and just barely) for the .md files in rust-lang/rust-analyzer and helix-editor/helix, but it doesn't for starship/starship and rust-lang/rust.

Several good quality Rust crates offer 22-to-24 small strings that would fit a lot more cases and avoid helix reinventing the wheel where it feels unnecessary.

Notably, I don't see any kind of benchmark justifying reimplementing small strings and all their complexity.

There is already a direct dependency on smartstring. Should try that one first if we are looking for small string optimized crates.

poliorcetics · 2025-04-06T11:43:07Z

helix-term/src/handlers/completion/request.rs

+    if let Some(word_completion_request) =
+        word::completion(editor, trigger, handle.clone(), savepoint)
+    {
+        requests.spawn_blocking(word_completion_request);
+    }


This file chains the completion requests building for the various type. Each handler also starts by doing mostly the same stuff, getting the cursor, the current line, ...

Maybe not for this MR, but finding a way to properly distribute this in parallel would be good

poliorcetics · 2025-04-06T11:50:45Z

helix-view/src/handlers/word_index.rs

+        let words: Vec<_> = words(text.slice(..)).collect();
+        let mut inner = self.inner.write();
+        for word in words {
+            inner.insert(word);
+        }


do we need the collect if words() already return an iterator ?

This is so that we hold the write lock for only a very short time. I'm not sure it's necessary though as parsing the words out should be quite fast

poliorcetics · 2025-04-06T11:51:38Z

helix-view/src/handlers/word_index.rs

+        let words: Vec<_> = words(text.slice(..)).collect();
+        let mut inner = self.inner.write();
+        for word in words {
+            inner.remove(word);
+        }


the-mikedavis · 2025-04-06T18:11:14Z

I even turn to 2-3 when I am writing documentation, it really keeps the flow going with quick action/responses.

We could consider having a lower trigger length for documentation languages like markdown. Mainly I find word completion noisy/annoying when it competes with LSP completions.

I wonder if this same mechanism could (later) be used for Unicode completion using Typst's Codex

The completion code part of this PR is not really very novel - it should be possible to add any sort of "core" completions now since #2608 did a bunch of refactoring to make it possible.

the-mikedavis added the S-needs-testing Status: Needs to be tested out in order to discover potential bugs. label Mar 27, 2025

the-mikedavis requested a review from pascalkuthe March 27, 2025 14:35

nik-rev reviewed Mar 27, 2025

View reviewed changes

helix-view/src/handlers/word_index.rs Show resolved Hide resolved

the-mikedavis force-pushed the word-completion branch from b9e3ee5 to b7186fe Compare March 29, 2025 14:17

the-mikedavis added 2 commits March 31, 2025 09:28

Complete words from open buffers

e8ec3f2

the-mikedavis force-pushed the word-completion branch from b7186fe to 9720020 Compare March 31, 2025 13:29

RoloEdits reviewed Apr 1, 2025

View reviewed changes

RoloEdits reviewed Apr 2, 2025

View reviewed changes

helix-term/src/handlers/completion/word.rs Show resolved Hide resolved

rhizoome mentioned this pull request Apr 3, 2025

unify grammars rhizoome/tree-sitter-ink#1

Open

poliorcetics reviewed Apr 6, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Word completion #13206

Word completion #13206

the-mikedavis commented Mar 27, 2025

jerabaul29 commented Mar 27, 2025

the-mikedavis commented Mar 27, 2025

RoloEdits Apr 1, 2025

darshanCommits commented Apr 2, 2025 •

edited

Loading

gabydd commented Apr 3, 2025

SeSodesa commented Apr 3, 2025

SeSodesa commented Apr 4, 2025

RoloEdits commented Apr 4, 2025

poliorcetics Apr 6, 2025

poliorcetics Apr 6, 2025

RoloEdits Apr 6, 2025

poliorcetics Apr 6, 2025

poliorcetics Apr 6, 2025

the-mikedavis Apr 6, 2025

poliorcetics Apr 6, 2025

the-mikedavis commented Apr 6, 2025

		\| `word-completion` \| Enable completion of words from open buffers. \| `true` \|
		\| `word-completion-trigger-length` \| Minimum number of characters required to automatically trigger word completion. \| `7` \|

Word completion #13206

Are you sure you want to change the base?

Word completion #13206

Conversation

the-mikedavis commented Mar 27, 2025

jerabaul29 commented Mar 27, 2025

the-mikedavis commented Mar 27, 2025

Choose a reason for hiding this comment

darshanCommits commented Apr 2, 2025 • edited Loading

gabydd commented Apr 3, 2025

SeSodesa commented Apr 3, 2025

SeSodesa commented Apr 4, 2025

RoloEdits commented Apr 4, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

the-mikedavis commented Apr 6, 2025

darshanCommits commented Apr 2, 2025 •

edited

Loading