Search code examples
python-3.xpandastolist

df.columns.tolist() to return strings not tuples


in pandas to find the columns of a df you do:

df.columns which returns a multiindex array.

If you want to add it to a variable you do:

columns=df.columns.tolist()

which would create a tuple for every columns name

e.g columns=[('A'),('B'),...]

is there a way to create the variable columns with each column as a string item of the list instead of a tuple item of the list or do you just have to do some list editing afterwards?


Solution

  • If you have a multiindex, it's not always clear that tolist() would produce a list of single strings, since it's possible there are, well, multiple indexes.

    However, as suggested by @jezreal in the comments, you can select the first level like so:

    df.columns.get_level_values(0).tolist()
    

    This can have duplicates since it gets the first level for every column. If instead, you want to select the possible values for level 0 (The "unique" values), you could use

    df_multiindex.columns.levels[0].tolist()
    

    Example:

    import pandas as pd
    from io import StringIO
    
    # Create Example Data
    df_multiindex = pd.read_csv(StringIO(
    '''Fruit,Color,Count,Price
    Apple,Red,3,$1.29
    Apple,Green,9,$0.99
    Pear,Red,25,$2.59
    Pear,Green,26,$2.79
    Lime,Green,99,$0.39''')).set_index(['Fruit', 'Color']).T
    
    # Print result
    print('get_level_values(0): {}'.format(df_multiindex.columns.get_level_values(0).tolist()))
    print('levels[0]:           {}'.format(df_multiindex.columns.levels[0].tolist()))
    

    Output:

    get_level_values(0): ['Apple', 'Apple', 'Pear', 'Pear', 'Lime']
    levels[0]:           ['Apple', 'Lime', 'Pear']