I have an xlsx file with over 1000 columns of data. I would like to firstly parse every second column from the data file (which can contain numbers and letters) and then create a unique list from the parsed data.
I'm a complete noob & have tried a "for" and "do while" loop but neither have worked for me.
So far I have:
import pandas as pd
workbook = pd.read_excel('C:\Python27\Scripts\Data.xlsx')
worksheet = workbook.sheetname='Data'
for col in range(worksheet[0], worksheet[1300]):
print(col)
I think I need to append the data and maybe write to a text file then create a unique list from the text file - I can do the second part it's just getting it into the text file I'm having trouble with.
Thanks
You can iterate over your columns by slicing and using a step arg i.e. df.ix[:, ::2]
In [35]:
df = pd.DataFrame({'a':1, 'b':[1,2,3,4,5], 'c':[2,3,4,5,6], 'd':0,'e':np.random.randn(5)})
df
Out[35]:
a b c d e
0 1 1 2 0 -0.352310
1 1 2 3 0 1.189140
2 1 3 4 0 -1.470507
3 1 4 5 0 0.742709
4 1 5 6 0 -2.798007
here we step every 2nd column:
In [37]:
df.ix[:,::2]
Out[37]:
a c e
0 1 2 -0.352310
1 1 3 1.189140
2 1 4 -1.470507
3 1 5 0.742709
4 1 6 -2.798007
we can then just call np.unique
on the entire df to get a single array of all the unique values:
In [36]:
np.unique(df.ix[:,::2])
Out[36]:
array([-2.79800676, -1.47050675, -0.35231005, 0.74270934, 1. ,
1.18914011, 2. , 3. , 4. , 5. , 6. ])