Search code examples
pythonvaex

vaex column name change


Hi I'm just getting started with Vaex in Python. I have a dataset with messy column names. I'm trying to replace spaces with '_'.

In pandas I'm able to df.column = df.columns.str.replace(' ', '_')

but in Vaex

df_column = df.column_names.str.replace('\s', '_', regex=True)

I get the following error


AttributeError Traceback (most recent call last) in ----> 1 df_new = df.column_names.str.replace('\s', '_', regex=True) AttributeError: 'list' object has no attribute 'str'

does anyone know what I may be doing wrong?

Thanks Mike


Solution

  • In Vaex the columns are in fact "Expressions". Expressions allow you do build sort of a computational graph behind the scenes as you are doing your regular dataframe operations. However, that requires the column names to be as "clean" as possible.

    So column names like '2', or '2.5' are not allows, since the expression system can interpret them as numbers rather than column names. Also column names like 'first-name', the expressions system can interpret as df['first'] - df['name'].

    To avoid this, vaex will smartly rename columns so that they can be used in the expression system. This is extremely complicated actually. Btw, you can always access the original names via df.get_column_names(alias=True).

    If you want to rename columns, you should use df.rename(name, new_name)

    I hope this helps!