Search code examples
python-3.xpandasdataframexlsxwriterpandas.excelwriter

Calculate the value which has the similar name in the header using - XLSWRITER using python


I need to calculate the value which has the similar name in the header. Here i have Bill and Non Bill fields in the column. I need to calculate all the Bill separately and non-bill separately and need to sum in another column Bill Amt Total and Non Bill Amt Total using xlswrriter in python.

Input file

Name    | Bill - Php    | Non Bill - Php    | Bill - JS  | Non Bill -JS
Alex    |   30          |                   |      10    |
Ram     |   10          |          20       |            |
Stephen |               |                   |      20    |
Robert  |               |          10       |            |      10
Mohan   |               |          20       |      10    |

Output file:

Name    | Bill - Php    | Non Bill - Php    | Bill - JS  | Non Bill -JS | Bill Total Amt | Non Bill Total Amt
Alex    |   30          |                   |      10    |              |    40          |       
Ram     |   10          |          20       |            |              |    10          |   20
Stephen |               |                   |      20    |              |                |   20
Robert  |               |          10       |            |      10      |                |   20
Mohan   |               |          20       |      10    |              |     10         |   20

Input file

Output file


Solution

  • Simply select columns by string prefix (str.startswith()) and then perform horizontal sum by df[selected_columns].sum(axis=1).

    In addition, Pandas comes with Excel save/load capability, so you don't really need to import xlsxwriter or openpyxl.

    Data

    The data is reproduced in Excel format.

    import pandas as pd
    import io
    import numpy as np
    
    df = pd.read_csv(io.StringIO("""
    Name    | Bill - Php  | Non Bill - Php  | Bill - JS  | Non Bill -JS
    Alex    |   30        |                 |      10    |
    Ram     |   10        |          20     |            |
    Stephen |             |                 |      20    |
    Robert  |             |          10     |            |      10
    Mohan   |             |          20     |      10    |
    """), sep=r"\|\s*", engine='python')
    
    # cleanup
    df.columns = [c.strip() for c in df.columns]
    df["Name"] = df["Name"].str.strip()
    
    # save .xlsx
    df.to_excel("/mnt/ramdisk/data.xlsx", index=False)
    

    Solution

    # load .xlsx
    df = pd.read_excel("/mnt/ramdisk/data.xlsx")
    
    for prefix in ("Bill", "Non Bill"):
        # select the columns to be summed
        cols_to_sum = [c for c in df.columns if c.startswith(prefix)]
        # new column name
        col = f"{prefix} Amt Total"
        # sum the selected columns horizontally
        df[col] = df[cols_to_sum].sum(axis=1)
        # (optional) replace 0 with nan
        df[col] = df[col].replace({0.0: np.nan})
    
    # save a new file
    df.to_excel("/mnt/ramdisk/out.xlsx", index=False)
    

    Check out the columns added:

    print(df.iloc[:,-2:])
    # Out[219]: 
    #    Bill Amt Total  Non Bill Amt Total
    # 0            40.0                 NaN
    # 1            10.0                20.0
    # 2            20.0                 NaN
    # 3             NaN                20.0
    # 4            10.0                20.0