Search code examples
pythoncsvunixsplit

Split a csv file into multiple files based on a pattern


I have a csv file with the following structure:

time,magnitude
0,13517
292.5669,370
620.8469,528
0,377
832.3269,50187
5633.9419,3088
20795.0950,2922
21395.6879,2498
21768.2139,647
21881.2049,194
0,3566
292.5669,370
504.1510,712
1639.4800,287
46709.1749,365
46803.4400,500

I'd like to split this csv file into separate csv files, like the following:

File 1:

time,magnitude
0,13517
292.5669,370
620.8469,528

File 2:

time,magnitude
0,377
832.3269,50187
5633.9419,3088
20795.0950,2922
21395.6879,2498

and so on..

I've read several similar posts (e.g., this, this, or this one), but they all search for specific values in a column and save each groups of values into a separate file. However, in my case, the values of time column are not the same. I'd like to split base on a condition: If time = 0, save that row and all subsequent rows in a new file until the next time =0.

Can someone please let me know how to do this?


Solution

  • With , you can use groupby and boolean indexing :

    #pip install pandas
    import pandas as pd
    
    df = pd.read_csv("input_file.csv", sep=",") # <- change the sep if needed
    
    for n, g in df.groupby(df["time"].eq(0).cumsum()):
        g.to_csv(f"file_{n}.csv", index=False, sep=",")
    

    Output :

        time  magnitude   # <- file_1.csv
      0.0000      13517
    292.5669        370
    620.8469        528
    
          time  magnitude # <- file_2.csv
        0.0000        377
      832.3269      50187
     5633.9419       3088
    20795.0950       2922
    21395.6879       2498