-
Notifications
You must be signed in to change notification settings - Fork 13.3k
Translate \r\n
-> \n
when reading files
#62865
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Labels
A-frontend
Area: Compiler frontend (errors, parsing and HIR)
C-cleanup
Category: PRs that clean code up or issues documenting cleanup.
T-compiler
Relevant to the compiler team, which will review and decide on the PR/issue.
Comments
\r\n
-> \n
when reading file\r\n
-> \n
when reading files
6 tasks
Found a bug: here we load doc comment via a file, without |
sbrugman
added a commit
to sbrugman/ruff
that referenced
this issue
Jan 21, 2023
sbrugman
added a commit
to sbrugman/ruff
that referenced
this issue
Jan 23, 2023
sbrugman
added a commit
to sbrugman/ruff
that referenced
this issue
Jan 24, 2023
github-merge-queue bot
pushed a commit
to leanprover/lean4
that referenced
this issue
May 19, 2024
To eliminate parsing differences between Windows and other platforms, the frontend now normalizes all CRLF line endings to LF, like [in Rust](rust-lang/rust#62865). Effects: - This makes Lake hashes be faithful to what Lean sees (Lake already normalizes line endings before computing hashes). - Docstrings now have normalized line endings. In particular, this fixes `#guard_msgs` failing multiline tests for Windows users using CRLF. - Now strings don't have different lengths depending on the platform. Before this PR, the following theorem is true for LF and false for CRLF files. ```lean example : " ".length = 1 := rfl ``` Note: the normalization will take `\r\r\n` and turn it into `\r\n`. In the elaborator, we reject loose `\r`'s that appear in whitespace. Rust instead takes the approach of making the normalization routine fail. They do this so that there's no downstream confusion about any `\r\n` that appears. Implementation note: the LSP maintains its own copy of a source file that it updates when edit operations are applied. We are assuming that edit operations never split or join CRLFs. If this assumption is not correct, then the LSP copy of a source file can become slightly out of sync. If this is an issue, there is some discussion [here](#3903 (comment)).
github-merge-queue bot
pushed a commit
to leanprover/lean4
that referenced
this issue
May 20, 2024
To eliminate parsing differences between Windows and other platforms, the frontend now normalizes all CRLF line endings to LF, like [in Rust](rust-lang/rust#62865). Effects: - This makes Lake hashes be faithful to what Lean sees (Lake already normalizes line endings before computing hashes). - Docstrings now have normalized line endings. In particular, this fixes `#guard_msgs` failing multiline tests for Windows users using CRLF. - Now strings don't have different lengths depending on the platform. Before this PR, the following theorem is true for LF and false for CRLF files. ```lean example : " ".length = 1 := rfl ``` Note: the normalization will take `\r\r\n` and turn it into `\r\n`. In the elaborator, we reject loose `\r`'s that appear in whitespace. Rust instead takes the approach of making the normalization routine fail. They do this so that there's no downstream confusion about any `\r\n` that appears. Implementation note: the LSP maintains its own copy of a source file that it updates when edit operations are applied. We are assuming that edit operations never split or join CRLFs. If this assumption is not correct, then the LSP copy of a source file can become slightly out of sync. If this is an issue, there is some discussion [here](#3903 (comment)).
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
A-frontend
Area: Compiler frontend (errors, parsing and HIR)
C-cleanup
Category: PRs that clean code up or issues documenting cleanup.
T-compiler
Relevant to the compiler team, which will review and decide on the PR/issue.
Currently, we handle
\r\n
explicitly in the lexer. We should do this at the file read time instead.Motivation:
Line endings should not affect semantics of the language. For example, git on windows by default checkouts with
\r\n
line endings, and it would be bad if compiling code on windows led to observably different results. This could be handled on a case-by-case basis in the lexer (the current approach), but it's easy to miss something (for example Raw Byte strings do not handle\r
#60604). Additionally, proc macros currently see original tokens, and so can observe different line endings. The simplest way to make compiler lineending-invariant is to normalize line endings as soon as possible.For IDE features, which work close to the source code, and especially for code generation during refactorings, the surface are where you need to handle different line endings is much larger. It would be easier for IDE to assume
\n
line endings and convert at one place, at the boundary.i propose that we convert
\r\n
once, inSourceFile::new
constructor. Note that we already do BOM-removal there, so, in theory, all code should already be prepared to mismatch between in-memory and on-disk file content. The replacement shortens the string, so it can be a pretty fast in-place transformation.One technical question is what to do with isolated
\r
? I see two options:\r\r\n
->\r\n
, and requires other code to explicitely not treat\r\n
as line ending.\r
in the source text. That is, after normalization,\r
can not be found in the source code at all.I propose to go with the second option it's slightly more annoying to implement, but seems more robust.
The text was updated successfully, but these errors were encountered: