Search code examples
rreadr

Read.table vs. read_csv: model are divergent


I have used both read.table (with arguments sep="\t", header = T, na.string = "NA") and the read_csv (with arguments col_names = T, na = "NA") from the reader package to read in a csv file. When I estimate a model, the summary shows vastly different results although the number of observations is the same. Now I don't know which of these two models is based on the correctly imported data. How can I go about debugging this?


Solution

  • Question: How to go about debugging an unexpected result after reading data into R.

    Answer: The first step, even before you run into a problem, should be to look at the data. You'll develop your own workflow, but mine involves looking at it in a text editor to see if my assumptions about it stand. I might search for certain values in the text editor. Then I look at it in R with str(my_data), head(my_data), colSums(is.na(my_data)), View(my_data), and depending on the structure of it, summary(my_data), either for the entire data frame, or for subsets of it (depending on how many variables it has).