Description
I have a CSV file with many columns (exported from Canvas). Left to its own devices, read_csv
produces the column spec shown at the end of this bug report. For some data crunching, I wanted to load only a subset of the columns, and many of the columns have similar names, so I wanted to do that with a general tidyselect expression. read_csv
lets me do that with col_select=
, great. But I also wanted to override some of the column types.
The documentation for col_types=
and cols()
made it sound like my choices were all bad:
- Give a type specification for all of the columns, even the ones I don't care about, either listing all of these long clunky names (and hardcoding numbers that could well change next term) or else relying on them to stay in a particular order.
- Use
cols_only
but write down all of the long clunky names for the columns I do care about. - Use
cols(..., .default=)
and put up with lots of parser warnings.
In fact, there is a perfectly good fourth option:
- Use
col_select
andcol_types
together, give acols()
spec that covers only the columns I care about, use.default
to avoid hardcoding parts of the name that might vary, and there won't be any junk warnings.
But this is not at all clear from the documentation. I only tried it as a gamble.
Please add some text and maybe also examples to the documentation, demonstrating how col_select
and col_types
can be used together.
Column spec
cols(
Student = col_character(),
ID = col_double(),
`SIS Login ID` = col_character(),
Section = col_character(),
`Written Assignment #1 (410713)` = col_character(),
`Written Assignment #2 (412436)` = col_character(),
`Written Assignment #3 (414305)` = col_double(),
`Written Assignment #4 (416013)` = col_double(),
`Written Assignment #5 (417893)` = col_double(),
`Written Assignment #6 (419737)` = col_double(),
`Written Assignment #7 (422534)` = col_double(),
`Written Assignment #8 (424000)` = col_double(),
`Written Assignment #11 (430113)` = col_double(),
`Written Assignment #12 (431301)` = col_double(),
`Written Assignment #9 (426527)` = col_double(),
`Written Assignment #10 (428208)` = col_double(),
`Day 2 Quiz (400398)` = col_double(),
`Day 3 Quiz (400400)` = col_double(),
`Day 4 Quiz (400411)` = col_double(),
`Day 5 Quiz (400419)` = col_double(),
`Day 6 Quiz (400414)` = col_double(),
`Day 7 Quiz (400415)` = col_double(),
`Day 8 Quiz (400410)` = col_double(),
`Day 17 Quiz (400407)` = col_double(),
`Day 9 Quiz (400413)` = col_double(),
`Day 10 Quiz (400408)` = col_double(),
`Day 11 Quiz (400401)` = col_double(),
`Day 12 Quiz (400404)` = col_double(),
`Day 13 Quiz (400421)` = col_double(),
`Day 14 Quiz (400409)` = col_double(),
`Day 15 Quiz (400405)` = col_double(),
`Day 16 Quiz (400406)` = col_double(),
`Day 18 Quiz (400402)` = col_double(),
`Day 19 Quiz (400403)` = col_double(),
`Day 20 Quiz (400397)` = col_double(),
`Day 21 Quiz (400420)` = col_double(),
`Day 22 Quiz (400422)` = col_double(),
`Day 23 Quiz (400418)` = col_double(),
`Day 24 Quiz (400423)` = col_double(),
`Day 25 Quiz (400399)` = col_double(),
`Assignments Current Score` = col_character(),
`Assignments Unposted Current Score` = col_character(),
`Assignments Final Score` = col_character(),
`Assignments Unposted Final Score` = col_character(),
`Imported Assignments Current Score` = col_character(),
`Imported Assignments Unposted Current Score` = col_character(),
`Imported Assignments Final Score` = col_character(),
`Imported Assignments Unposted Final Score` = col_character(),
`Current Score` = col_character(),
`Unposted Current Score` = col_character(),
`Final Score` = col_character(),
`Unposted Final Score` = col_character()
)