Search code examples
pythonexcelpandasunique

Using pandas to extract all unique values across all columns in excel file


I am fairly new to pandas/python. I’m trying to ingest an excel file and output an excel file that contains all unique values in each column.

I am easily able to print the values using:

for col in df:
    print(df[col].unique())

My question is: how can I get these results to be sent to an excel file and not just print them. Thank you in advance.


Solution

  • Use dictionary comprehension to create your new unique frame and then create a new file. Dictionaries can easily be used to create a DataFrame as long as the arrays are the same length. To make sure the arrays are the same length when using unique we just convert the arrays to a pd.Series

    You do not have an expected output so I assume you just want all the unique values in one file with nans for missing values.

    pd.DataFrame({col: pd.Series(df[col].unique()) 
                  for col in df.columns}).to_excel('some_file_name.xlsx', index=False)
    

    Here is a working example

    # sample df
    import pandas as pd
    df = pd.DataFrame([[1,2,3,1,2],
                       list('abccd'),
                       [1]*5]).T
    
       0  1  2
    0  1  a  1
    1  2  b  1
    2  3  c  1
    3  1  c  1
    4  2  d  1
    

    output

    pd.DataFrame({col: pd.Series(df[col].unique()) 
                  for col in df.columns})
    
         0  1    2
    0    1  a    1
    1    2  b  NaN
    2    3  c  NaN
    3  NaN  d  NaN