python-3.x pandas dataframe xlsxwriter pandas.excelwriter

Calculate the value which has the similar name in the header using - XLSWRITER using python

I need to calculate the value which has the similar name in the header. Here i have Bill and Non Bill fields in the column. I need to calculate all the Bill separately and non-bill separately and need to sum in another column Bill Amt Total and Non Bill Amt Total using xlswrriter in python.

Input file

Name    | Bill - Php    | Non Bill - Php    | Bill - JS  | Non Bill -JS
Alex    |   30          |                   |      10    |
Ram     |   10          |          20       |            |
Stephen |               |                   |      20    |
Robert  |               |          10       |            |      10
Mohan   |               |          20       |      10    |

Output file:

Name    | Bill - Php    | Non Bill - Php    | Bill - JS  | Non Bill -JS | Bill Total Amt | Non Bill Total Amt
Alex    |   30          |                   |      10    |              |    40          |       
Ram     |   10          |          20       |            |              |    10          |   20
Stephen |               |                   |      20    |              |                |   20
Robert  |               |          10       |            |      10      |                |   20
Mohan   |               |          20       |      10    |              |     10         |   20

Solution

Simply select columns by string prefix (str.startswith()) and then perform horizontal sum by df[selected_columns].sum(axis=1).

In addition, Pandas comes with Excel save/load capability, so you don't really need to import xlsxwriter or openpyxl.

Data

The data is reproduced in Excel format.

import pandas as pd
import io
import numpy as np

df = pd.read_csv(io.StringIO("""
Name    | Bill - Php  | Non Bill - Php  | Bill - JS  | Non Bill -JS
Alex    |   30        |                 |      10    |
Ram     |   10        |          20     |            |
Stephen |             |                 |      20    |
Robert  |             |          10     |            |      10
Mohan   |             |          20     |      10    |
"""), sep=r"\|\s*", engine='python')

# cleanup
df.columns = [c.strip() for c in df.columns]
df["Name"] = df["Name"].str.strip()

# save .xlsx
df.to_excel("/mnt/ramdisk/data.xlsx", index=False)

Solution

# load .xlsx
df = pd.read_excel("/mnt/ramdisk/data.xlsx")

for prefix in ("Bill", "Non Bill"):
    # select the columns to be summed
    cols_to_sum = [c for c in df.columns if c.startswith(prefix)]
    # new column name
    col = f"{prefix} Amt Total"
    # sum the selected columns horizontally
    df[col] = df[cols_to_sum].sum(axis=1)
    # (optional) replace 0 with nan
    df[col] = df[col].replace({0.0: np.nan})

# save a new file
df.to_excel("/mnt/ramdisk/out.xlsx", index=False)

Check out the columns added:

print(df.iloc[:,-2:])
# Out[219]: 
#    Bill Amt Total  Non Bill Amt Total
# 0            40.0                 NaN
# 1            10.0                20.0
# 2            20.0                 NaN
# 3             NaN                20.0
# 4            10.0                20.0