Skip to content

Commit b80e29d

Browse files
fix(dict): Remove only corrections that could contain spaces
The typo dictionary words.csv previously contained a bunch of problematic entries such as: abouta,about algorithmi,algorithm attachen,attach shouldbe,should anumber,number Which resulted in wrong corrections if the following spaces (indicated by ␣) were accidentally missed: about␣a algorithm␣i developed attach␣en masse should␣be a␣number Many of these entries were introduced by taking entries from the codespell-dict and removing corrections containing spaces (since typos currently doesn't support them), e.g the codespell dictionary contains: abouta->about a, about, shouldbe->should, should be, This commit updates `tests/verify.rs` to automatically remove corrections in the form of `{correction}{common_word},{correction}` or `{common_word}{correction},{correction}`, where `{common_word}` is one of the 1000 most frequent English words (except if `{correction}` also ends/starts in `{common_word}`, since we still want to correct e.g. "extrememe" to "extreme"). The top-1000-most-frequent-words.csv file was generated by running: curl https://norvig.com/ngrams/count_1w.txt \ | head -n1024 \ | awk '{print $1;}' \ | grep -vE '^([^ia]|al|re)$' \ > top-1000-most-frequent-words.csv
1 parent d4258b1 commit b80e29d

File tree

4 files changed

+1229
-162
lines changed

4 files changed

+1229
-162
lines changed

0 commit comments

Comments
 (0)