Skip to content

Suggestion: documentation and warnings specifying write_tsv assumes IANA spec #1343

Open
@ambevill

Description

@ambevill

Function write_tsv is the only write_* or format_* function that defaults to quote = 'none', and this is a recent behavior change (motivated by IANA considerations in #993 and #844 ). This default is an easy source of bugs for a naive user (e.g. me) that may have delim or eol characters in their content. (Tabs are explicitly forbidden by the IANA spec, but eol characters are also a problem for the same reason.)

So I would propose:

  • Adding documentation explaining that write_tsv is intended for the IANA spec, and that many users should call it with quote = 'needed' or tend toward another writer, e.g. write_excel_csv(..., delim = '\t') if their data contains eol or delim characters.
  • Adding a warning if vroom::vroom_format or vroom::vroom_write is called with quote = 'none' and the data contains eol or delim characters. (This task belongs in a vroom issue, not here. If this task is seriously considered then I will transfer it.)

I'm happy to create PRs for both, if given a greenlight.

Miscellaneous observations:

  • Edition 1 stream_delim always behaves like quote = 'needed'. There is only an issue for edition 2 using vroom.
  • Search for the quote_needed enum in the vroom package to find instances where the quoting logic is applied. This would be an ideal place to set a flag indicating that quotes were needed but disabled.

I don't think this needs a reprex, but here's a small test case:

library(dplyr)

# edition 1 always quotes as needed
readr::with_edition(1, {
  
  tibble(x = 'abc\tdef',
         z = 'abc\ndef') %>%
    readr::format_tsv('\t', quote = 'none') %>%
    readr::read_tsv()
  
})
# output: the tibble is correctly read

# edition 2 passes the quote argument to vroom
readr::with_edition(2, {
  
  tibble(x = 'abc\tdef',
         z = 'abc\ndef') %>%
    readr::format_tsv('\t', quote = 'none') %>%
    readr::read_tsv()
  
})
# output: the tibble is mangled

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions