Search code examples
pythonpandaslistdataframeconcatenation

Concat multiples dataframes within a list


I have several dataframes in a list, obtained after using np.array_split and I want to concat some of then into a single dataframe. In this example, I want to concat 3 dataframes contained in b (all but the 2nd one, which is the element b[1] in the list):

df = pd.DataFrame({'country':['a','b','c','d'],
  'gdp':[1,2,3,4],
  'iso':['x','y','z','w']})

a = np.array_split(df,4)
i = 1
b = a[:i]+a[i+1:]

desired_final_df = pd.DataFrame({'country':['a','c','d'],
  'gdp':[1,3,4],
  'iso':['x','z','w']})

I have tried to create an empty df and then use append through a loop for the elements in b but with no complete success:

CV = pd.DataFrame()
CV = [CV.append[(b[i])] for i in b] #try1
CV = [CV.append(b[i]) for i in b] #try2
CV = pd.DataFrame([CV.append[(b[i])] for i in b]) #try3

for i in b:
 CV.append(b) #try4

I have reached to a solution which works but it is not efficient:

CV = pd.DataFrame()
CV = [CV.append(b) for i in b][0]

In this case, I get in CV three times the same dataframe with all the rows and I just get the first of them. However, in my real case, in which I have big datasets, having three times the same would result in much more time of computation.

How could I do that without repeating operations?


Solution

  • To cancatenate multiple DFs, resetting index, use pandas.concat:

    pd.concat(b, ignore_index=True)
    

    output

        country gdp iso
    0   a   1   x
    1   c   3   z
    2   d   4   w