Search code examples
pythonpandaspycharm

PyCharm variable explorer does not display pandas column names with whitespace


I have this .xlsx file I can read successfully by using:

pandas.read_excel(file_name, sheet_name="customers", index_col=0)

This works great for most columns but a few with a space between characters like in "profile url". This columns are just missing.

EDIT:

This is some code to reproduce the issue:

import pandas as pd

def read_excel(file_name):
    df = pd.read_excel(file_name, sheet_name="customers", index_col=0)
    for entry in df.iterrows():
        print(entry)
    return df


read_excel("test_table.xlsx")

And this a example table to use:

ID,First,Last,Profile Url
1,foo,bar,www.google.com
2,fake,name,https://stackoverflow.com/

This is the value of entry in the first iteration. Doing so I can the objects First and Last.

I would expect to see Profile Url as well.

enter image description here

What I learned by preparing this example is that any header written in lowercase will also be ignored.


Solution

    • The behavior is not related to any specific file type, it's the case for any dataframe with a space in the column name, regardless of the method with which the dataframe is created.
    • Currently there is an issue with JetBrains, regarding this behavior.
    • The resolution is, fix the columns, by replacing the spaces with another character, such as '_'.
    • Lowercase column names do not preset the same issue. My guess is there is leading or trailing whitespace in the column name, which can be removed with .str.strip()
    import pandas as pd
    
    df = pd.DataFrame({'col_no_spaces': [1, 2, 3], 'col with spaces': ['a', 'b', 'c'], ' col_with_leading_trailing_ws ': [4, 5, 6]})
    
    # display(df)
       col_no_spaces col with spaces   col_with_leading_trailing_ws 
    0              1               a                               4
    1              2               b                               5
    2              3               c                               6
    
    • Note the columns with spaces, are not available to View as Series

    enter image description here

    # strip leading and trailing whitespace, and replace spaces in column names with _
    df.columns = df.columns.str.strip().str.replace('\s+', '_', regex=True)
    
    # display(df)
       col_no_spaces col_with_spaces  col_with_leading_trailing_ws
    0              1               a                             4
    1              2               b                             5
    2              3               c                             6
    
    • Note that all columns are now available to View as Series

    enter image description here