Python function to merge columns into one column

I am trying to build a function, to apply to different columns with a csv file. The purpose of the function is to combine multiple columns into one, using the combine_first method described here: How to merge/combine columns in pandas?

I like to create a function, with parameters: csvName, ColName, NumberOfCol, and generate a new (combined) column for that csv file. Because there are different sets of columns that need to be grouped (and there are multiple csv files), I like to use a function on each set of columns.

Example columns (showing only the ones that need to be merged, the DF has many other columns that I need to retain):

DF = {'A1': [tree, grass, np.nan, np.nan, np.nan, np.nan],  
    'A2': [np.nan, np.nan, bird, fly, np.nan, np.nan],
    'A3': [np.nan, np.nan, np.nan, np.nan, cat, dog]}

DF["A"] = pd.concat([DF["A1"].\
                           combine_first(DF["A2"]).\
                               combine_first(DF["A3"])], 
            axis=1)
DF.rename(columns={"A1":"A"}, inplace=True)

The resulted column will become: DF = {'A': [tree, grass, bird, fly, cat, dog]}

I like to make this operation into a function, so I can enter the name of the csv, the column name (e.g., A), and number of columns that need to be merged

I have something like this at the moment:

   def column_merger(csvName, ColName, length:int):
        OriginalNameList = []
        for number in range(length):
            OriginalNameList.append(ColName + str(number))
    
        NewList = OriginalNameList[1:]

        for name in Newlist:
            NewList.append()
        command = []
        command.extend([.combine_first(SurveyName[NewList[name]]) for name in range(NewList)])

But it is not working. Any suggestion very much welcomed! Thank you!

Solution

I think this approach does what you're looking for, but note that I had to turn DF into a DataFrame object, since I assume that's what it's supposed to be

import numpy as np
import pandas as pd


DF = pd.DataFrame({
    'A1': ['tree', 'grass', np.nan, np.nan, np.nan, np.nan],
    'A2': [np.nan, np.nan, 'bird', 'fly', np.nan, np.nan],
    'A3': [np.nan, np.nan, np.nan, np.nan, 'cat', 'dog']
})

# use 'stack()' to combine the columns, dropping the nan values
# use 'reset_index()[0]' to get the desired data out of the stack as a Series
# and then use 'to_frame('A')' to convert that series back into a DataFrame 
# with the desired column name 'A'
DF = DF.stack().reset_index()[0].to_frame('A')
print(DF)

#        A
# 0   tree
# 1  grass
# 2   bird
# 3    fly
# 4    cat
# 5    dog

Note that you can do each of those steps separately (without chaining them) if you want to see the intermediate results

DF = DF.stack()
print(DF)
DF = DF.reset_index()[0]
print(DF)
DF = DF.to_frame('A')
print(DF)  # final result

I should point out, however, that there are pending implementation changes to the stack function which will alter its behavior in Pandas 3.0 (NB: at time of writing, Pandas is currently at v2.2). You can read more about this in the Pandas 2.1 release notes