Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Word completion #13206

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

Word completion #13206

wants to merge 2 commits into from

Conversation

the-mikedavis
Copy link
Member

Completion of words from open buffers is a feature I miss from Kakoune. The spirit of this implementation is the similar: words are extracted from open buffers and stored in a database (WordDB in Kakoune). Then completion reads from the database to fuzzy-find candidates. All reading and writing to the index is done off the main thread and large changes to the index are debounced. Updates to the index examine the ChangeSet which caused the change and only consider windows of the text around each Operation::Insert and Operation::Delete.

Note that this doesn't follow Pascal's idea for using Aho-Corasick on the current buffer (comment) because I would like words from all documents. Word completion is especially useful when taking notes - for example one window has code I'm reading and the other window has a markdown file for notes - and being able to quickly complete words from the other buffer(s) is really useful for that workflow.

This PR also adds a custom small string type based on what I use in spellbook. I like it because it's simple and specialized for the use case (so, very compact). I'm not very attached to it though and we could slot in some other type. I think we already have a transitive dependency on KString, for one option.

Closes #1063

@the-mikedavis the-mikedavis added the S-needs-testing Status: Needs to be tested out in order to discover potential bugs. label Mar 27, 2025
@jerabaul29
Copy link
Contributor

This would be super nice. And it is also very useful for people who do not have all their LSPs installed and do not have LSP completion available, as this will help get for example complex variable names right :) .

Will this be case sensitive LikeClassNames / include snake_names? :)

@the-mikedavis
Copy link
Member Author

It's also useful in my workflow in multi-language repos: some files might be written in C and others in another language. LSP doesn't cover that case very well.

The set of words WordIndex contains the words as they were written in each document but WordIndex::matches uses the default fuzzy matching behavior we use everywhere else, so you can get case-insensitive completions like TSParseOption when typing tsparse for example.

`TinyBoxedStr` is a small-string optimized replacement for `Box<str>`
styled after <https://cedardb.com/blog/german_strings/>. A nearly
identical type is used in helix-editor/spellbook for space savings.

This type is specialized for its use-case:

* strings are immutable after creation
* strings are very very small (less than 256 bytes long)

Because of these attributes we can use nearly the full size of the type
for the inline representation and also keep a small total size. Many
other small string crates in the wild are 3 `usize`s long (same as a
regular `String`) to support mutability. This type is more like a
`Box<str>` (2 `usize`s long).

Mostly I like this small string type though because it's very
straightforward to implement. Other than a few functions that reach into
`std::alloc` or `std::ptr`, the code is short and boring.
Comment on lines +40 to +41
| `word-completion` | Enable completion of words from open buffers. | `true` |
| `word-completion-trigger-length` | Minimum number of characters required to automatically trigger word completion. | `7` |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this could be encapsulated as

[editor.word-completion]
enable = true
trigger-length = 7

@darshanCommits
Copy link
Contributor

darshanCommits commented Apr 2, 2025

I think word completions shouldn't appear if same item is provided by other superior source.
or should this be delegated to another PR to filter out duplicate entries in completion entirely.

edit: cant seem to upload picture?

@gabydd
Copy link
Member

gabydd commented Apr 3, 2025

The same problem happens with path completions so I think that should be a seperate pr so we can figure out the best way to deduplicate completions

@SeSodesa
Copy link

SeSodesa commented Apr 3, 2025

I wonder if this same mechanism could (later) be used for Unicode completion using Typst's Codex: https://github.com/typst/codex. In addition to looking for words from the open buffers, they would also be searched from the "dictionary" provided by Codex. Related to #1438.

@SeSodesa
Copy link

SeSodesa commented Apr 4, 2025

I wonder if this same mechanism could (later) be used for Unicode completion using Typst's Codex: https://github.com/typst/codex. In addition to looking for words from the open buffers, they would also be searched from the "dictionary" provided by Codex. Related to #1438.

Of course a problem might occur, if one actually wants to write alpha and not α. Maybe the word completion could then offer both variants, or alternatively one might make the Unicode completion intention explicit by starting a word with a modifier such as # (inspired by the fact that in Typst the code mode initiator is a #). So you would write #alpha to get an autocompletion with a α as its output. The Unicode completion modifier could also be something like u+, instead of #, if something less tied to Typst were desirable.

@RoloEdits
Copy link
Contributor

Just want to say that this is such a nice QoL improvement! Something you really didn't know how much you missed until you had it again after a long absence. In testing, I did feel like 5 was a much better target for trigger length. I even turn to 2-3 when I am writing documentation, it really keeps the flow going with quick action/responses.

@@ -71,6 +71,7 @@ These configuration keys are available:
| `text-width` | Maximum line length. Used for the `:reflow` command and soft-wrapping if `soft-wrap.wrap-at-text-width` is set, defaults to `editor.text-width` |
| `rulers` | Overrides the `editor.rulers` config key for the language. |
| `path-completion` | Overrides the `editor.path-completion` config key for the language. |
| `word-completion` | Overrides the `editor.word-completion` config key for the language. |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing the override of word-completion-trigger-length (and should probably include @RoloEdits suggestion of merging under [word-completion] enable; trigger-length;

Comment on lines +22 to +26
pub struct TinyBoxedStr {
len: u8,
prefix: [u8; Self::PREFIX_LEN],
trailing: TinyBoxedStrTrailing,
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two usizes is 16 bytes.

This works on average (and just barely) for the .md files in rust-lang/rust-analyzer and helix-editor/helix, but it doesn't for starship/starship and rust-lang/rust.

Several good quality Rust crates offer 22-to-24 small strings that would fit a lot more cases and avoid helix reinventing the wheel where it feels unnecessary.

Notably, I don't see any kind of benchmark justifying reimplementing small strings and all their complexity.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is already a direct dependency on smartstring. Should try that one first if we are looking for small string optimized crates.

Comment on lines +251 to +255
if let Some(word_completion_request) =
word::completion(editor, trigger, handle.clone(), savepoint)
{
requests.spawn_blocking(word_completion_request);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file chains the completion requests building for the various type. Each handler also starts by doing mostly the same stuff, getting the cursor, the current line, ...

Maybe not for this MR, but finding a way to properly distribute this in parallel would be good

Comment on lines +186 to +190
let words: Vec<_> = words(text.slice(..)).collect();
let mut inner = self.inner.write();
for word in words {
inner.insert(word);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need the collect if words() already return an iterator ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is so that we hold the write lock for only a very short time. I'm not sure it's necessary though as parsing the words out should be quite fast

Comment on lines +212 to +216
let words: Vec<_> = words(text.slice(..)).collect();
let mut inner = self.inner.write();
for word in words {
inner.remove(word);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

@the-mikedavis
Copy link
Member Author

I even turn to 2-3 when I am writing documentation, it really keeps the flow going with quick action/responses.

We could consider having a lower trigger length for documentation languages like markdown. Mainly I find word completion noisy/annoying when it competes with LSP completions.

I wonder if this same mechanism could (later) be used for Unicode completion using Typst's Codex

The completion code part of this PR is not really very novel - it should be possible to add any sort of "core" completions now since #2608 did a bunch of refactoring to make it possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-needs-testing Status: Needs to be tested out in order to discover potential bugs.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Lexical autocomplete
8 participants