I have a large number of CSV files that look like this:
var val1 val2
a 2 1
b 2 2
c 3 3
d 9 2
e 1 1
I would like to:
I think I have managed to get to point 3 by doing this:
csvList <- list.files(path = "mypath", pattern = "*.csv", full.names = T)
bla <- lapply(lapply(csvList, read.csv), function(x) x[order(x$val1, decreasing=T)[1:3], ])
lapply(bla,"[", , 1, drop=FALSE)
Now, I have a list of the top 3 variables in each CSV. However, I don't know how to convert this list to a string and keep only the unique values.
Any help is welcome.
Thank you!
The issue is in extracting the first columns of bla
with drop=FALSE
. This preserves the results as a list of columns (where each row has a name
) instead of coercing it to its lowest dimension, which is a vector. Use drop=TRUE
instead and then unlist
followed by unique
as @Frank suggests:
unique(unlist(lapply(bla,"[", , 1, drop=TRUE)))
As you know, drop=TRUE
is the default, so you don't even have to include it.
Update to new requirements in comments.
To keep the first two columns var
and var1
and remove duplicates in var
(keep only the unique var
s), do the following:
## unlist each column in turn and form a data frame
res <- data.frame(lapply(c(1,2), function(x) unlist(lapply(bla,"[", , x))))
colnames(res) <- c("var","var1") ## restore the two column names
## remove duplicates
res <- res[!duplicated(res[,1]),]
Note that this will only keep the first row for each unique var
. This is the definition of removing duplicates here.
Hope this helps.