Search code examples
rlistloopscsvtail

Picking last row data only from 2000 csv in the same directory and make single dataframe through R


Using R, I want to pick last row data only from over 2000 csv in the same directory and make single dataframe.

Directory = "C:\data

File name, for example '123456_p' (6 number digit)

Each csv has different number of rows, but has the same number of columns (10 columns)

I know the tail and list function, but over 2000 dataframes, inputting manually is time wasting.

Is there any way to do this with loop through R?

As always, I really appreciate your help and support


Solution

  • There are four things you need to do here:

    1. Get all the filenames we want to read in
    2. Read each in and get the last row
    3. Loop through them
    4. Bind them all together

    There are many options for each of these steps, but let's use purrr for the looping and binding, and base-R for the rest.

    1. Get all the filenames we want to read in

    You can do this with the list.files() function.

    filelist = list.files(pattern = '.csv')
    

    will generate a vector of filenames for all CSV files in the working directory. Edit as appropriate to specify the pattern further or target a different directory.

    1. Read each in and get the last row

    The read.csv() function can read in each file (if you want it to go faster, use data.table::fread() instead), and as you mentioned tail() can get the last row. If you build a function out of this it will be easier to loop over, or change the process if it turns out you need another step of cleaning.

    read_one_file = function(x) {
      tail(read.csv(x), 1)  
    }
    
    1. Loop through them
    2. Bind them all together

    You can do both of these steps at once with map_df() in the purrr package.

    library(purrr)
    final_data = map_df(filelist, read_one_file)