Skip to content

Commit e240e50

Browse files
fix(dict): Remove unsure corrections
The typo dictionary words.csv previously contained a bunch of problematic entries such as: abouta,about algorithmi,algorithm attachen,attach shouldbe,should Which resulted in wrong corrections if the following spaces (indicated by ␣) were accidentally missed: about␣a algorithm␣i developed attach␣en masse should␣be Many of these entries were introduced by taking entries from the codespell-dict and removing corrections containing spaces (since typos currently doesn't support them), e.g the codespell dictionary contains: abouta->about a, about, shouldbe->should, should be, This commit updates `tests/verify.rs` to automatically remove entries in the form of `{correction}{common_word},{correction}`, where `{common_word}` is one of the 1000 most frequent English words (except if `{correction}` also ends in `{common_word}`, since still want to correct e.g. "extrememe" to "extreme"). The top-1000-most-frequent-words.csv file was generated by running: curl https://norvig.com/ngrams/count_1w.txt \ | head -n1024 \ | awk '{print $1;}' \ | grep -vE '^([^ia]|al|re)$' \ > top-1000-most-frequent-words.csv
1 parent 41ce6be commit e240e50

File tree

4 files changed

+1132
-354
lines changed

4 files changed

+1132
-354
lines changed

0 commit comments

Comments
 (0)