Search code examples
pythoncsvraspberry-pi

Split CSV file in Python with semicolon separating the records


I have a CSV file with 288 records in the following format (extract of first 3 records only)

20210402,23:55,37684,4.758,0,0,0.000,16238,510,NaN,242.0,-500.000,0.000,500.000,-500.000,10.000,NaN;20210402,23:50,37684,4.758,0,0,0.000,16195,540,NaN,243.0,-530.000,0.000,530.000,-530.000,10.000,NaN;20210402,23:45,37684,4.758,0,0,0.000,16150,540,NaN,243.0,-550.000,0.000,550.000,-550.000,0.000,NaN;

This file is saved as continuous string if I open it in Notepad. If I open it in excel then each of the values are in a column. The last column of the first record is shared with the first column of the seconded record and so on, as per below

excel

I am trying to spit the string so there are 288 records using

with open('dailyData.csv','r') as file:
    array = file.readlines()
    array = [row.split(';') for row in array]

The ";" is converted to ", " but it does not split the string

[['20210402,23:55,37684,4.758,0,0,0.000,16238,510,NaN,242.0,-500.000,0.000,500.000,-500.000,10.000,NaN', '20210402,23:50,37684...

any ideas?

UPDATE

I have tried to use Pandas

array = pandas.read_csv('dailyData.csv', delimiter=';', header=None)
print(array.columns)

This returns

Int64Index([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, ... 277, 278, 279, 280, 281, 282, 283, 284, 285, 286], dtype='int64', length=287)

which means I can address any record but not elements within the record.

Just looking to be able to address as array[r][c] where r is 0 to 287 and c is 0 to 16.

thanks

Hi Kamil, I have tried

 with open('dailyData.csv') as file:
    for x in file:
        columns = x.split(';')
        for y in columns:
            lines = y.split(',')
            print(lines)

This prints the lines but dos not create the array that I can address.


Solution

  • How about this:

    with open("data.csv") as f:
        array = [l.split(",") for l in f.readline().split(";") if l]
    
    print(len(array))
    print(array[1][0])
    

    Output: where 3 is the number of lists within the array and each list has 16 values.

    3
    20210402
    

    The above allows for:

    Just looking to be able to address as array[r][c] where r is 0 to 287 and c is 0 to 16.

    I've assumed that your data is one long continuous string, as shown in your question.

    If you feel like it, this can be easily dumped to a pandas DataFrame and then to a proper .csv file:

    import pandas as pd
    
    with open("data.csv") as f:
        array = [l.split(",") for l in f.readline().split(";") if l]
    
    pd.DataFrame(array).to_csv("your_array.csv", header=False, index=False)