Open
Description
Function write_tsv
is the only write_*
or format_*
function that defaults to quote = 'none'
, and this is a recent behavior change (motivated by IANA considerations in #993 and #844 ). This default is an easy source of bugs for a naive user (e.g. me) that may have delim or eol characters in their content. (Tabs are explicitly forbidden by the IANA spec, but eol characters are also a problem for the same reason.)
So I would propose:
- Adding documentation explaining that write_tsv is intended for the IANA spec, and that many users should call it with
quote = 'needed'
or tend toward another writer, e.g.write_excel_csv(..., delim = '\t')
if their data contains eol or delim characters. - Adding a warning if
vroom::vroom_format
orvroom::vroom_write
is called withquote = 'none'
and the data contains eol or delim characters. (This task belongs in a vroom issue, not here. If this task is seriously considered then I will transfer it.)
I'm happy to create PRs for both, if given a greenlight.
Miscellaneous observations:
- Edition 1
stream_delim
always behaves likequote = 'needed'
. There is only an issue for edition 2 usingvroom
. - Search for the
quote_needed
enum in the vroom package to find instances where the quoting logic is applied. This would be an ideal place to set a flag indicating that quotes were needed but disabled.
I don't think this needs a reprex, but here's a small test case:
library(dplyr)
# edition 1 always quotes as needed
readr::with_edition(1, {
tibble(x = 'abc\tdef',
z = 'abc\ndef') %>%
readr::format_tsv('\t', quote = 'none') %>%
readr::read_tsv()
})
# output: the tibble is correctly read
# edition 2 passes the quote argument to vroom
readr::with_edition(2, {
tibble(x = 'abc\tdef',
z = 'abc\ndef') %>%
readr::format_tsv('\t', quote = 'none') %>%
readr::read_tsv()
})
# output: the tibble is mangled