Search code examples
python-2.7parsingpandasuniquexlrd

Creating a unique list using Pandas


I have an xlsx file with over 1000 columns of data. I would like to firstly parse every second column from the data file (which can contain numbers and letters) and then create a unique list from the parsed data.

I'm a complete noob & have tried a "for" and "do while" loop but neither have worked for me.

So far I have:

    import pandas as pd
    workbook = pd.read_excel('C:\Python27\Scripts\Data.xlsx')
    worksheet = workbook.sheetname='Data'

    for col in range(worksheet[0], worksheet[1300]):
        print(col)

I think I need to append the data and maybe write to a text file then create a unique list from the text file - I can do the second part it's just getting it into the text file I'm having trouble with.

Thanks


Solution

  • You can iterate over your columns by slicing and using a step arg i.e. df.ix[:, ::2]

    In [35]:
    df = pd.DataFrame({'a':1, 'b':[1,2,3,4,5], 'c':[2,3,4,5,6], 'd':0,'e':np.random.randn(5)})
    df
    
    Out[35]:
       a  b  c  d         e
    0  1  1  2  0 -0.352310
    1  1  2  3  0  1.189140
    2  1  3  4  0 -1.470507
    3  1  4  5  0  0.742709
    4  1  5  6  0 -2.798007
    

    here we step every 2nd column:

    In [37]:
    df.ix[:,::2]
    
    Out[37]:
       a  c         e
    0  1  2 -0.352310
    1  1  3  1.189140
    2  1  4 -1.470507
    3  1  5  0.742709
    4  1  6 -2.798007
    

    we can then just call np.unique on the entire df to get a single array of all the unique values:

    In [36]:
    np.unique(df.ix[:,::2])
    
    Out[36]:
    array([-2.79800676, -1.47050675, -0.35231005,  0.74270934,  1.        ,
            1.18914011,  2.        ,  3.        ,  4.        ,  5.        ,  6.        ])