Skip to content

readr guesses unexpected column types for values containing a “D” #1484

Open
@klmr

Description

@klmr

parse_double apparently interprets D as an alternative to E for scientific exponent notation. For people used to R (unless they also know Fortran), this is quite unexpected, and does not seem to be documented anywhere. Compare to the behaviour of core R:

$ readr::parse_double('12d3')
[1] 12000

$ as.numeric('12d3')
[1] NA
Warning message:
NAs introduced by coercion

$ str2lang('12d3')
Error in str2lang("12d3") : <text>:1:3: unexpected symbol
1: 12d3
      ^

So far, so good. Unfortunately this leads to surprises during automatic column type guessing. For instance:

$ readr::read_csv(I("test\n12d3"))
Rows: 1 Columns: 1
── Column specification ───────────────────────────────────────────────────────────────────
Delimiter: ","
dbl (1): testUse `spec()` to retrieve the full column specification for this data.Specify the column types or set `show_col_types = FALSE` to quiet this message.
# A tibble: 1 × 1
   test
  <dbl>
1 12000

I’d wager that this isn’t the expected or desired behaviour for most uses of ‘readr’. — Is there maybe a way to disable this? Something to the effect of “guess column types, but use a conservative parser for number formats.” Or, alternatively, maybe “guess column types, but do not consider scientific exponent notation.”

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions