Search code examples
pythondjangopandasdjango-modelscsv-import

Python/Django- Save csv file data with messy rows


I have a csv file with the following structure:

A, 10
B, 11
C, 8
D, 12
A, 21
B, 7
D, 22
D, 15
C, 111
D, 50
A, 41
B, 32
C, 19
D, 11

I want to read the entire file, and save the data on the second column, if the row is like A, B, C, D format. I have a list:

my_list = [A, B, C, D]

And I check each 4 row, if it is in my_list format, then read and save to the database like:

with open('csv_file.csv', 'rb') as csvfile:
        the_file = csv.reader(csvfile.read().splitlines())

        for idx, row in enumerate(islice(zip(*[the_file]*4), 100)):
            my_model = Django_model()
            if row[0][0] == my_list[0]:
                if row[0][0] == my_list[0] and row[1][0] == my_list[1] and \
                    row[2][0] == my_list[2] and row[3][0] == my_list[3]:
                    my_model.a_field = row[0][1]
                    my_model.b_field = row[1][1]
                    my_model.c_field = row[2][1]
                    my_model.d_field = row[3][1]
                    my_model.save()

The fact is that it is working if and ONLY if the structure of the row is the same as my_list. But when it gets to the messy part (A, B, D, D, C, D) it will not read the rows properly, and therefore many rows are skipped.

The question is that, how can I jump to the next interesting row (which follows my_list format) and read it? Meanwhile save the the skipped rows in another list?

I have heard that Pandas could help, but I have been through the documentation, and I couldn't find a way to solve this case.


Solution

  • You can extract the pattern and corresponding values like this:

    import pandas as pd
    import re
    
    df = pd.read_csv('/home/yusuf/Desktop/c1', header=None, names=['c1','c2'])
    l1=[]
    for a in re.finditer('ABCD', ''.join(df.c1.tolist())):
        l1.append(range(a.start(),a.end()))
    l2 = [b for a in l1 for b in a]
    print df[df.index.isin(l2)], '\n'
    print df[~df.index.isin(l2)]
    

    Output:

       c1  c2
    0   A  10
    1   B  11
    2   C   8
    3   D  12
    10  A  41
    11  B  32
    12  C  19
    13  D  11
    
      c1   c2
    4  A   21
    5  B    7
    6  D   22
    7  D   15
    8  C  111
    9  D   50