Search code examples
pythonexceldata-comparison

Excel columns comparison using python code


I am working with excel for comparing three columns: my idea is to compare two columns of data with a third column as a array like each value in the 3rd column should be compared with every row of the first and second column and want to extract only those rows where the first and second column's data is present in the third column I used this python command

if([x in x,y for datafile] == [x in x for file) and [y in x,y for datafile] == [x in x for file]): 
    print x,y
else:
    print none        

this gave me an error as syntax error

I have converted my first two columns into a tuple using the zip function the x,y corresponds to the values in the tuple

Col_1 ||  Col_2    ||   file
Abc   |    Abk     |    cnl
Nck   |    Nck     |    Abk
xkl   |    cnl     |    Abc  
mzn   |    mzn     |  

this I have combined as datafile ((Abc,Abk),(Nck,Nck),(xkl,cnl),(mzn,mzn))

Note: my column 3 has smaller values than col 1&2. I have over 100k values to compare

I want a working python program for this query

if [x for x,y in mydata if x == genelist and
y for x,y in mydata if y == genelist]:
    print (x,y)
else: 

can someone correct the syntax error in the above code here

mydata('gene1,genea','gene2,geneb''gene3,genec') and genelist ('genea','geneb','genec') 

when I use the code without if statement it prints me "[]" I don't know what's wrong here


Solution

  • You could use pandas.Series.isin to filter it:

    For your excel data (eg:comparison.xlsx) :

    enter image description here

    Use:

    import pandas as pd
    df = pd.read_excel('comparison.xlsx')
    result = df[df['finaldata1'].isin(list(df['check'])) & df['finaldata2'].isin(list(df['check']))]
    result
    

    it will give you:

        finaldata1  finaldata2  check
    0   Abc         Abk         cnl
    

    as Abc and Abk is in column file.

    Update: Write result to excel file:

    from pandas import ExcelWriter
    
    writer = ExcelWriter('PythonExport.xlsx')
    result.to_excel(writer,'Sheet1',index=False)
    writer.save()
    

    The result will be write into excel file PythonExport.xlsx:

    enter image description here