Search code examples
rsortingcolumnsorting

How to sort .csv files in R


I have one .csv file which i have imported into R. It contains a column with locations, some locations are repeated depending on how many times that location has been surveyed. I have another column with the total no. of plastic items.

I would like to add together the number of plastic items for locations that appear more than once, and create a separate column with the total no. of plastic and another column of the no. of times the location appeared.

I am unsure how to do this, any help will be much appreciated.


Solution

  • Using dplyr:

    data %>% 
       group_by(location) %>% 
       mutate(TOTlocation=n(),TOTitems=sum(items)) 
    

    And here's a base solution that does pretty much the same thing:

    data[c("TOTloc","TOTitem")]<-t(sapply(data$location, function(x)
              c(TOTloc=sum(data$location==x),
                TOTitem=sum(data$items[data$location==x]))))
    

    Note that in neither case do you need to sort anything - in dplyr you can use group_by to have each action done on only the part of the data set that belongs to a group determined by the contents of a certain column. In my base solution, I break down the locations list using sapply and then recalculate the TOTloc and TOTitem again for each row. This may not be a very efficient solution. A better solution will probably use split, but for some reason I couldn't make it work with my made up dataset, so maybe someone else can suggest how to best do that.