in pandas to find the columns of a df you do:
df.columns
which returns a multiindex array.
If you want to add it to a variable you do:
columns=df.columns.tolist()
which would create a tuple for every columns name
e.g
columns=[('A'),('B'),...]
is there a way to create the variable columns
with each column as a string item of the list instead of a tuple item of the list or do you just have to do some list editing afterwards?
If you have a multiindex, it's not always clear that tolist()
would produce a list of single strings, since it's possible there are, well, multiple indexes.
However, as suggested by @jezreal in the comments, you can select the first level like so:
df.columns.get_level_values(0).tolist()
This can have duplicates since it gets the first level for every column. If instead, you want to select the possible values for level 0 (The "unique" values), you could use
df_multiindex.columns.levels[0].tolist()
import pandas as pd
from io import StringIO
# Create Example Data
df_multiindex = pd.read_csv(StringIO(
'''Fruit,Color,Count,Price
Apple,Red,3,$1.29
Apple,Green,9,$0.99
Pear,Red,25,$2.59
Pear,Green,26,$2.79
Lime,Green,99,$0.39''')).set_index(['Fruit', 'Color']).T
# Print result
print('get_level_values(0): {}'.format(df_multiindex.columns.get_level_values(0).tolist()))
print('levels[0]: {}'.format(df_multiindex.columns.levels[0].tolist()))
get_level_values(0): ['Apple', 'Apple', 'Pear', 'Pear', 'Lime']
levels[0]: ['Apple', 'Lime', 'Pear']