Search code examples
pythonexcelconcatenationmultiple-columns

Erroneous column concatenation Python


I have a data frame where in the first column I have to concatenate the other two if this record is empty.

 Cuenta CeCo   GLAccount   CeCoCeBe
  123 A           123         A
  234 S           234         S
  NaN             345         B
  NaN             987         A


for x in df1["Cuenta CeCo"].isna():
if x:
    df1["Cuenta CeCo"]=df1["GLAccount"].apply(str)+" "+df1["CeCoCeBe"]
else :
    df1["Cuenta CeCo"]

TYPES:

df1["Cuenta CeCo"] = dtype('O')
df1["GLAccount"] = dtype('float64')
df1["CeCoCeBe"] = dtype('O')

expected output:

Cuenta CeCo   GLAccount   CeCoCeBe
  123 A           123         A
  234 S           234         S
  345 B           345         B
  987 A           987         A

however it seems that when concatenating it does something strange and throws me other numbers and letters

 Cuenta CeCo   
  251 O
  471 B
  791 R
  341 O

Could someone support me to know why this happens and how to correct it to have my expected exit?


Solution

  • Iterating over dataframes is typically bad practice and not what you intend. As you have done it, you are actually iterating over the columns. Try

    for x in df:
        print(x)
    

    and you will see it print the column headings.

    As for what you're trying to do, try this:

    cols = ['Cuenta CeCo', 'GLAccount', 'CeCoCeBe']
    mask = df[cols[0]].isna()
    df.loc[mask, cols[0]] = df.loc[mask, cols[1]].map(str) + " " + df.loc[mask, cols[2]]
    

    This generates a mask (in this case a series of True and False) that we use to get a series of just the NaN rows, then replace them by getting the string of the second column and concatenating with the third, using the mask again to get only the rows we need.