Search code examples
pythoncsvrowerase

delete line, of a .csv, when certain column has no value python


I have a .csv file and I am trying to erase some rows/lines that have no usable information. I want to delete lines that do not have a value in a certain column. I am kinda new to programming and I could not find a way to do this. Is this possible?

I tried to delete a line if it did not have a certain number in it but that did not work as wel.

f = open('C:myfile.csv', 'rb')
lines = f.readlines()
f.close()

filename = 'myfile.csv'

f = open(filename, 'wb')
for line in lines:
    if line != "1":
        f.write(line)

f.close()

here are some sample rows:

0,593   0,250984    -20,523384  -25,406271
0,594   0,250984        
0,595   0,250984        
0,596   0,250984        
0,597   0,250984    -15,793088  -21,286336
0,598   0,250984        
0,599   0,908811        
0,6     0,893612        
0,601   0,784814    -12,130922  -11,825742
0,602   0,909238        
0,603   0,25309     
0,604   0,38435     
0,605   0,602954    -8,316167   -3,43328
0,606   0,642628        
0,607   0,39201     
0,608   0,384289        
0,609   0,251656    -11,825742  -5,874723

So I want to delete the rows when there is no number in the third and fourth column.


Solution

  • You can use Python's csv library to help you do this. Your data appears to be tab delimited, as such the following script should work:

    import csv
    
    with open('input.csv', 'rb') as f_input, open('output.csv', 'wb') as f_output:
        csv_output = csv.writer(f_output, delimiter = '\t')
    
        for row in csv.reader(f_input, delimiter = '\t'):
            if len(row[2]) and len(row[3]):
                csv_output.writerow(row)
    

    Giving you an output.csv file containing:

    0,593   0,250984    -20,523384  -25,406271
    0,597   0,250984    -15,793088  -21,286336
    0,601   0,784814    -12,130922  -11,825742
    0,605   0,602954    -8,316167   -3,43328
    0,609   0,251656    -11,825742  -5,874723
    

    Note, each of your rows appears to have 4 columns (your data has tabs for these missing entries), because of this, it is not enough to simply test the length is 4. You need to test the contents of the two cells.