Search code examples
pythonpandasdataframeconcatenationappend

Does pandas concat give different structure than append?


I read the other posts on SO about this, but still doesn't address my question:

See below image. I appended 4 identical rows and did the same with concat. But the results don't appear visually similar.

enter image description here

When I tried to extract a column of the data, the results also don't appear visually similar. Are they the same, just for some reason appear different, or am I using it wrong? The desired results are the one from the append, not the concat.

enter image description here

Question has been answered without needing any codes. But here are my reproducible codes for future viewers:

Given any csv file or the following example dataframe:

import pandas as pd
import numpy as np
from datetime import datetime
from string import ascii_lowercase as al

# artificial dataframe
np.random.seed(365)
rows = 15
cols = 2
data = np.random.randint(0, 10, size=(rows, cols))
index = pd.bdate_range(datetime.today(), freq='d', periods=rows)

dfdata = pd.DataFrame(data=data, index=index, columns=list(al[:cols]))

# dfdata = pd.read_csv("data.csv")
dfdatanew = pd.DataFrame()

frames = [dfdata.iloc[2],dfdata.iloc[2],dfdata.iloc[2],dfdata.iloc[2]]

dfdatanew = dfdatanew.append(dfdata.iloc[2])
dfdatanew = dfdatanew.append(dfdata.iloc[2])
dfdatanew = dfdatanew.append(dfdata.iloc[2])
dfdatanew = dfdatanew.append(dfdata.iloc[2])

result = pd.concat(frames,axis=0,join='outer')

# compare
print(result)
print(dfdatanew)

Solution

  • You must provide a clear reproducible example in your question with code/data as text.

    That said, the error is pretty obvious. You slice Series with iloc to feed your concat. Thus your are concatenating a long Series.

    Slice frames by using .iloc[[2]]:

    frames = [dfdata.iloc[[2]], dfdata.iloc[[2]], dfdata.iloc[[2]], dfdata.iloc[[2]]]
    result = pd.concat(frames)
    

    You could also concat the Series on axis=1 and transpose, but this would mess up the dtypes.

    NB. I'm assuming this was a dummy example. If you really want to repeat 4 times a given row, use: result = dfdata.reindex([2]*4).