Search code examples
pythonpython-3.xpandascsvdelete-row

Deletion of a particular row in a csv file using pandas


Relatively New to pandas and trying to delete every row from file XYZ that is present in file ABC.

Code:

import pandas as pd

# Reads two excel files
clm1 = pd.read_csv('ABC.csv')
clm2 = pd.read_csv('XYZ.csv')

# Prints file length
print('Main file clm2: '+ str(len(clm2['image_url'])))
print('Referral file clm1': str(len(clm1['Input.image_url'])))

for index1 in clm1.index:
    for index2 in clm2.index:
        if clm2['image_url'][index2] == clm1['Input.image_url'][index1]:
            print("Entered into deletion condition!!")

            print(clm2['image_url'][index2])
            print(clm1['Input.image_url'][index1])
            print('\n \n')

            clm2.drop(clm2['image_url'][index2], axis=0, inplace=True)
            print('Deleted!!')

print('Main file clm2: ' + str(len(clm2['image_url'])))

On entering the deletion condion, it's printing the below line correctly:

            print(clm2['image_url'][index2])
            print(clm1['Input.image_url'][index1])
            print('\n \n')

But getting an error on the line:

clm2.drop(clm2['image_url'][index2], axis=0, inplace=True)

Error says:

  File "compare_delete_imagelinks.py", line 19, in <module>
    clm2.drop(clm2['image_url'][index2], axis=0, inplace=False)
  File "/Users/AjayB/anaconda3/envs/MyDjangoEnv/lib/python3.6/site-packages/pandas/core/frame.py", line 3940, in drop
    errors=errors)
  File "/Users/AjayB/anaconda3/envs/MyDjangoEnv/lib/python3.6/site-packages/pandas/core/generic.py", line 3780, in drop
    obj = obj._drop_axis(labels, axis, level=level, errors=errors)
  File "/Users/AjayB/anaconda3/envs/MyDjangoEnv/lib/python3.6/site-packages/pandas/core/generic.py", line 3812, in _drop_axis
    new_axis = axis.drop(labels, errors=errors)
  File "/Users/AjayB/anaconda3/envs/MyDjangoEnv/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 4965, in drop
    '{} not found in axis'.format(labels[mask]))
KeyError: "['https://Xxxxxxx.216PPU~V.JPG'] not found in axis"
(MyDjangoEnv) SL-SP-LAP-0384:scripts AjayB$ 

How to tackle this?


Solution

  • This should work if your csv look like this:

    XYZ.csv:

    name,value
    a,1
    b,2
    c,3
    d,4
    e,5
    f,6
    

    ABC.csv:

    name,value
    a,1
    b,2
    c,3
    d,4
    

    Code:

    import pandas as pd
    import numpy as np
    
    xyz = pd.read_csv("XYZ.csv", index_col='name')
    abc = pd.read_csv("ABC.csv", index_col='name')
    
    for i in abc.index:
        if i in xyz.index:
            xyz.drop(i, axis=0, inplace=True)
    
    print(xyz)