Search code examples
pythonpandasconcatenationvaex

python: How concatenate pandas dataframes with VAEX


I would like to join thousands of dataframes into one VAEX dataframe Following the documentation I have: https://vaex.readthedocs.io/en/latest/api.html?highlight=concat#vaex.concat

I do:

df_vaex = vaex.DataFrame()
for i,file in enumerate(files):
    df = pd.read_pickle(file)
    df_vx = vaex.from_pandas(df=df, copy_index=False)
    df_vaex.concat(df_vx)
    if i%100 == 0:
        print(i)

this does not work.

How can I read and concatenate dataframes in vaex?

I get the error that vaex does not have the method concat: AttributeError: 'DataFrame' object has no attribute 'concat'

enter image description here

Second try following the first comment:

for i,file in enumerate(files):
    df = pd.read_pickle(file)
    df_vaex_total = vaex.from_pandas(df=df, copy_index=False)
    if i == 0:
        pass
    else:
        print(type(df_vaex_total)) # its equal to <class 'vaex.dataframe.DataFrameLocal'>
        print(type(df_vx)) # its equal to <class 'vaex.dataframe.DataFrameLocal'>
        
        df_vaex_total = pd.concat([df_vaex_total, df_vx])
        
    if i%10 == 0:
        print(i)

error: TypeError: cannot concatenate object of type '<class 'vaex.dataframe.DataFrameLocal'>'; only Series and DataFrame objs are valid


Solution

  • If you want to use vaex to concat dataframes you need to do it in the following way:

    • read in all dataframes first
    • create a list of dataframes
    • use df_final = vaex.concat(list_of_dataframes)

    So your code would look something like this:

    list_of_dataframes = []
    
    for i, file in enumerate(files)
        pdf = pd.read_pickle(file)
        df = vaex.from_pandas(pdf)
        list_of_dataframes.append(df)
    
    df_final = vaex.concat(list_of_dataframes)