Search code examples
rtype-conversionreadr

"Warning message: Unknown or uninitialised column:df" after converting a few columns to numeric


I got this message after I convert a few columns from "characters" to "numeric": Warning message: Unknown or uninitialised column:df

I needed to load a csv file (from Qualtrics) into R.

filename <- "/Users/Study1.csv"
library(readr)
df <- read_csv(filename)

The first row contains the variable names, but the second and the third rows are a chunk of characters not useful for R. Therefore, I needed to remove those two rows. However, since R already recognised rows 18 to the end to be characters thanks to those useless chunks of strings, I needed to convert these rows manually to numeric (which is necessary for me to do further analysis).

# The 2nd and 3rd rows of the csv file are useless (they are strings)
df <- df[3:nrow(df), ]
# cols 18 to the end are supposed to be numeric, but the 2nd and 3rd rows are string, so R thinks that these columns contain strings
df[ ,18:ncol(df)] <- lapply(df[ ,18:ncol(df)], as.numeric)

After running the above code, the error popped up:

Warning message:
Unknown or uninitialised column: 'df'. 
Parsed with column specification:
cols(
  .default = col_character()
)
See spec(...) for full column specifications.
NAs introduced by coercionNAs introduced by coercion

The NAs are fine. But the error message is annoying. Is there a better way to convert my columns to numeric? Thank you all!

EDITED Thank you all for your advice. I tried the method of skiping the 2nd and the 3rd rows. However, one peculiar thing happened. Because on cell contains multiple rows, separate by empty lines, R recognised it incorrectly. enter image description here I blurred the original text in the picture. It happens whether or not I clicked ""First Row as Names". Can you suggest any fix to it? Thanks all again.

UPDATE on 2018-05-30: I've solved the problem. Please see my answer below or visit How to import Qualtrics data (in csv format) into R


Solution

  • Thank you all for your advice and comments. I heeded @alistaire 's advice of using skip.

    As per the newline in the qualtrics cell, I found that I could click on "More options" when exporting data, and select "remove line breaks".

    Following the advice from Skip specific rows using read.csv in R, I used the following code to solve my problem.

    headers = read.csv(filename, header = F, nrows = 1, as.is = T)
    df = read.csv(filename, skip = 3, header = F)
    colnames(df)= headers