Search code examples
pythonpandasdataframenumeric

How to merge multiple columns containing numeric data in Pandas, but ignore empty cells


I have a table like this:

|-----|-----|-----|
|  A  |  B  |  C  |
|-----|-----|-----|
|     |  5  |     |
|-----|-----|-----|
|  1  |     |     |
|-----|-----|-----|
|     |  5  |     |
|-----|-----|-----|
|     |     |  2  |
|-----|-----|-----|
|     |     |  2  |
|-----|-----|-----|

where each column in the desired range has only one integer in its row. I want to merge these columns into a single new column that would look like this:

|-----|-----|-----|    |-----|
|  A  |  B  |  C  |    |  Z  |
|-----|-----|-----|    |-----|
|     |  5  |     | →  |  5  |
|-----|-----|-----|    |-----|
|  1  |     |     | →  |  1  |
|-----|-----|-----|    |-----|
|     |  5  |     | →  |  5  |
|-----|-----|-----|    |-----|
|     |     |  2  | →  |  2  |
|-----|-----|-----|    |-----|
|     |     |  2  | →  |  2  |
|-----|-----|-----|    |-----|

I have been searching, but the closest solution I can find is doing something like:

df.iloc[:,some_column:another_column].apply( lambda x: "".join(x.astype(str)), axis=1)

However, this also concatenates "NaN"s from the blank cells, which is obviously undesirable.

How might I get my desired output?


Solution

  • I think it is what you want.

    import pandas as pd
    df = pd.DataFrame({"A":[np.nan, 1, np.nan, np.nan, np.nan],
                       "B": [5, np.nan, 5, np.nan, np.nan]})
    df['Z'] = df.sum(axis = 1)
    

    Alternatively, you can use

    df['Z'] = df.max(axis = 1)
    

    Which might be safer if (per chance) you have multiple non-NULL values and just want one of them (the largest one in this case).