I am fairly new to pandas/python. I’m trying to ingest an excel file and output an excel file that contains all unique values in each column.
I am easily able to print the values using:
for col in df:
print(df[col].unique())
My question is: how can I get these results to be sent to an excel file and not just print them. Thank you in advance.
Use dictionary comprehension to create your new unique frame and then create a new file. Dictionaries can easily be used to create a DataFrame as long as the arrays are the same length. To make sure the arrays are the same length when using unique
we just convert the arrays to a pd.Series
You do not have an expected output so I assume you just want all the unique values in one file with nans
for missing values.
pd.DataFrame({col: pd.Series(df[col].unique())
for col in df.columns}).to_excel('some_file_name.xlsx', index=False)
Here is a working example
# sample df
import pandas as pd
df = pd.DataFrame([[1,2,3,1,2],
list('abccd'),
[1]*5]).T
0 1 2
0 1 a 1
1 2 b 1
2 3 c 1
3 1 c 1
4 2 d 1
output
pd.DataFrame({col: pd.Series(df[col].unique())
for col in df.columns})
0 1 2
0 1 a 1
1 2 b NaN
2 3 c NaN
3 NaN d NaN