I am looping through a data frame, doing some calculations, and then depending on some logic adding the original series or a transformed version. For a MRE, I'll leave out the transformation part.
# List of Tuples
students = [('Ankit', 22, 'A'),
('Swapnil', 22, 'B'),
('Priya', 22, 'B'),
('Shivangi', 22, 'B'),
]
# Create a DataFrame object
stu_df = pd.DataFrame(students, columns =['Name', 'Age', 'Section'],
index =['1', '2', '3', '4'])
returnList = []
for i, (colname, series) in enumerate(stu_df.iteritems()):
returnList.append(pd.DataFrame(series))
a = pd.concat(returnList)
the index is consistent across the series, so why is is not being recognized cleanly? a looks like this, but it should line up to look like the original data?:
index | Name | Age | Section |
---|---|---|---|
1 | Ankit | NaN | |
2 | Swapnil | NaN | NaN |
3 | Priya | NaN | NaN |
4 | Shivangi | NaN | NaN |
1 | NaN | 22.0 | NaN |
2 | NaN | 22.0 | NaN |
3 | NaN | 22.0 | NaN |
4 | NaN | 22.0 | NaN |
1 | NaN | NaN | A |
2 | NaN | NaN | B |
3 | NaN | NaN | B |
4 | NaN | NaN | B |
You need to add axis=1
to pd.concat()
:
a = pd.concat(returnList, axis=1)
Output:
>>> a
Name Age Section
1 Ankit 22 A
2 Swapnil 22 B
3 Priya 22 B
4 Shivangi 22 B
By default, pd.concat
tries to concatenate the dataframes vertically (axis=0
), i.e., adding the second df to the end of the first, and the third to the end of that. But since all the Series object in returnList
have different column names, pandas adds them to the end, and fills in the missing spaces with NaN:
>>> pd.concat(returnList)
Name Age Section
1 Ankit NaN NaN <--- first df of returnList starts here
2 Swapnil NaN NaN
3 Priya NaN NaN
4 Shivangi NaN NaN
1 NaN 22.0 NaN <--- second df of returnList starts here
2 NaN 22.0 NaN
3 NaN 22.0 NaN
4 NaN 22.0 NaN
1 NaN NaN A <--- third df of returnList starts here
2 NaN NaN B
3 NaN NaN B
4 NaN NaN B