I am trying to apply a function on a column of a dataframe. After getting multiple results as dataframes, I want to concat them all in one.
Why does the first option work and the second not?
import numpy as np
import pandas as pd
def testdf(n):
test = pd.DataFrame(np.random.randint(0,n*100,size=(n*3, 3)), columns=list('ABC'))
test['index'] = n
return test
test = pd.DataFrame({'id': [1,2,3,4]})
testapply = test['id'].apply(func = testdf)
#option 1
pd.concat([testapply[0],testapply[1],testapply[2],testapply[3]])
#option2
pd.concat([testapply])
pd.concat
expects a sequence of pandas objects, but your #2 case/option passes a sequence of single pd.Series
object that contains multiple dataframes, so it doesn't make concatenation - you just get that series as is.
To fix your 2nd approach use unpacking:
print(pd.concat([*testapply]))
A B C index
0 91 15 91 1
1 93 85 91 1
2 26 87 74 1
0 195 103 134 2
1 14 26 159 2
2 96 143 9 2
3 18 153 35 2
4 148 146 130 2
5 99 149 103 2
0 276 150 115 3
1 232 126 91 3
2 37 242 234 3
3 144 73 81 3
4 96 153 145 3
5 144 94 207 3
6 104 197 49 3
7 0 93 179 3
8 16 29 27 3
0 390 74 379 4
1 78 37 148 4
2 350 381 260 4
3 279 112 260 4
4 115 387 173 4
5 70 213 378 4
6 43 37 149 4
7 240 399 117 4
8 123 0 47 4
9 255 172 1 4
10 311 329 9 4
11 346 234 374 4