I'm working in R and I have a file where there is a string in the first row of the .csv, a blank row, and then my date.
I tried to remove this by doing:
df <- df[-c(1,2), ]
Instead of removing this string and the empty row, it removes the first two rows of the data. This indicates to me that R is disregarding this as part of my dataset.
I would like to remove it, but lack the vocabulary as to what this data is so that I can figure out how to remove it. I don't want to remove anything that might indicate important information like:
# A tibble: 0 x 5
# ... with 5 variables: row <int>, col <int>, expected <chr>,
# actual <chr>, file <chr>
So my assumption is to not delete the header.
Here is the output when I type head(df)
:
# A tibble: 6 x 2
`Category: All categories` ...2
<chr> <chr>
1 NA NA
2 Month st: (United States)
3 2004-01 25
4 2004-02 21
5 2004-03 20
6 2004-04 24
Here is the file, if that helps. It is output from Google Trends.
Thank you, I'm new at this, but really trying to learn it. :)
Your use of -c(0,1)
suggests that you believe R's indexing to be 0-based (similar to python and C); it is not, R's indexing is 1-based.
Most (all?) CSV-reading functions have a skip=
argument:
head(read.csv("2004_present_st.csv", skip=2))
# Month st...United.States.
# 1 2004-01 25
# 2 2004-02 21
# 3 2004-03 20
# 4 2004-04 24
# 5 2004-05 24
# 6 2004-06 24
### all verified to work similarly
readr::read_csv("2004_present_st.csv", skip=2)
data.table::fread("2004_present_st.csv", skip=2)
vroom::vroom("2004_present_st.csv",skip=2)
In case you've already read in the data and need to repair it, though ...
df <- read.csv("2004_present_st.csv")
nms <- unlist(df[2,])
df <- df[-(1:2),]
names(df) <- nms
df[] <- lapply(df, type.convert, as.is = TRUE)
head(df)
# Month st: (United States)
# 3 2004-01 25
# 4 2004-02 21
# 5 2004-03 20
# 6 2004-04 24
# 7 2004-05 24
# 8 2004-06 24
str(df)
# 'data.frame': 217 obs. of 2 variables:
# $ Month : chr "2004-01" "2004-02" "2004-03" "2004-04" ...
# $ st: (United States): int 25 21 20 24 24 24 23 21 20 23 ...
Notes on this cleanup:
-(1:2)
instead of -c(1,2)
, mostly the same.df[] <- lapply(df, ...)
is a way to iterate a function over all columns and return them into place, preserving the class of "data.frame"
for the df
object. Had we done df <- lapply(df, ...)
instead, df
would now be a list; as.data.frame(lapply(df, ...))
would work, but I think df[] <- lapply(df, ...)
looks/works better.type.convert
is used by read.csv
(and other places) to try to convert strings into integers and numbers; as.is=TRUE
prevents strings from being converted to factor
s, feel free to omit this if you prefer.