Search code examples
pythoniosexcelopenpyxl

Adjusting Python script to add data to Excel table


I’m totally new to Python and I’m banging my head against a wall trying to solve my problem.

I’ve got a table in an Excel file I keep in iCloud Drive to track expenses, and I need to utilize this file in both iOS and MacOS.
Up to now, I’ve used the combination of an Apple Shortcut, an underlying CSV file, and VBA to quickly enter expense data in Shortcuts, pass it to the CSV, and then periodically auto-append these data to my Excel file. (This append in Excel only happens when I open the file on my Mac since iOS Excel doesn’t support macros.)

Now I would like to skip the CSV & VBA route and do this strictly with Shortcuts & a Python script so that I can update the Excel file even from iOS.
I’m using the A-Shell app to facilitate the Python script execution.

I feel like I’m 90% to the finish line.
However, despite various tweaks to my script (courtesy of ChatGPT), I cannot come up with a script that successfully

  1. Extends the table’s formatting & formulas [which are generally structured references] to the newly-added row; and
  2. Does not corrupt the Excel file in the process.

Here’s the latest iteration of my script:

from openpyxl import load_workbook
from openpyxl.utils import get_column_letter
import sys

# Get input data from command-line arguments
input_data = sys.argv[1:]  # Example: ["Value1", "Value2", "Value3"]

# Ensure there are exactly three inputs
if len(input_data) != 3:
    print("Error: Please provide exactly three input values.")
    sys.exit(1)

# Define the full path to your Excel file
excel_file = "/path_to_my_file/dummy.xlsx"

try:
    # Load the workbook and worksheet
    wb = load_workbook(excel_file)
    ws = wb['Expenses']

    # Identify the table range
    table_name = 'tbl_expenses'
    if table_name not in ws.tables:
        print(f"Table '{table_name}' not found in worksheet.")
        sys.exit(1)

    table = ws.tables[table_name]
    start_cell, end_cell = table.ref.split(':')
    start_row = ws[start_cell].row
    end_row = ws[end_cell].row
    end_col = ws[end_cell].column

    # Determine the next row index
    new_row_idx = end_row + 1

    # Insert data starting from the second column (Column B)
    for col_index, value in enumerate(input_data, start=2):  # Start at column B
        ws.cell(row=new_row_idx, column=col_index, value=value)

    # Extend structured reference formulas for remaining columns
    for col in range(1, end_col + 1):
        source_cell = ws.cell(row=end_row, column=col)
        target_cell = ws.cell(row=new_row_idx, column=col)
        if source_cell.data_type == 'f':  # If it's a formula
            target_cell.value = source_cell.value

    # Update the table's range manually
    from openpyxl.utils import get_column_letter
    new_end_cell = f"{get_column_letter(end_col)}{new_row_idx}"
    table.ref = f"{start_cell}:{new_end_cell}"

    # Save the workbook
    wb.save(excel_file)
    print("Row added successfully with extended structured reference formulas.")
except Exception as e:
    print(f"An unexpected error occurred: {e}")

Thanks in advance for the help!

UPDATE: To clarify the formatting issue, I've tried a few approaches to extending the fonts, borders, fill, other formatting of the old last row to the newly-added row. While some of this worked, I failed when trying to trying to assign the "new_cell.Fill" properties with either (1) the Fill properties of the old cell (e.g. start_color, end_color); (2) the PatternFill of the old cells using the Color object & RGB; or (3) other variants of this.

As for the formulas in columns A & E:G that I want to extend to the new row, they are all currently based on structured references and one of them calls a custom function name based on LAMBDA. The simplest formula-based column here looks like the following:

=IF([@[My Net]]<>"",INDEX([Cum Bal],ROW()-2)+[@[My Net]],"")

Solution

  • Ostensively what you are doing should not be a problem. Adding a new row to a Table and updating the references is simple as updating table.ref as your code covers.

    For issue 2 regarding corrupted file;
    you need to be more specific about your data and what scenarios cause the workbook to become corrupted.

    For issue 1, regarding the formula references;
    Updating of cell references when copying a formula is an Excel function. With Openpyxl you'll just get a copy the formula as is from the source to target cell.
    This is easy enough to fix by updating the formula with new references before writing or use the Openpyxl translator function.
    As for formatting, you need to be more specific about what formatting you are referring to.

    Formula Translator Example
    If you want/need to update the formula manually then we'd need to see what the formula is to comment.
    Note: The source cell does not have to be from the previous row. If your column has the same formulae for all cells down the column (with references updated for the current row) you can use any cell in the column for your source.
    I.e. if the first row with in the table is 2 and cell 'E2' contains the first formula for the column E then 'E2" can be your source every time a new row is added. There is no need to keep finding/using the previous row's cell as the source.
    Copying the source cell formula to the target cell and translating for the new row with the following.

    from openpyxl.formula.translate import Translator
    
    ...
            if source_cell.data_type == 'f':  # If it's a formula
                # target_cell.value = source_cell.value,
                target_cell.value = Translator(source_cell.value, 
                                               origin=source_cell.coordinate).translate_formula(target_cell.coordinate)
    
    ...            
    

    Some of your code could be cleaned up/simplified;

    1. For example if the added row is the next unused row on the Sheet, the list input_data can be 'appended' to the sheet (albeit with column A padded).
      Openpyxl 'append' will add the list values, one to each column starting from Column A (hence the need to pad Column A), in to the next unused row in the Sheet which is easier doing loop adding each value.
      This would work if your table is the only data in the sheet or at least that the table to be updated max row is the max for the Sheet. However this may not be the case given you are checking the table names. See commented section in the example code below

    2. Don't you know which columns have formulae? Seems unnecessary to check the columns B, C & D for formulas, aren't these values like the ones added to the new row from the list input_data? Are you adding formulas to row as part of the data updates?
      If the formulae is in Column E (onwards) only then just use that column letter directly.

    3. Using Openpyxl APIs can save extracting from text values

    4. Probably no need to include specific exit conditions, just check to continue execution rather than end it.

    Example Code

    from openpyxl import load_workbook
    from openpyxl.formula.translate import Translator
    from openpyxl.worksheet.cell_range import CellRange
    from openpyxl.utils import range_boundaries
    from openpyxl.styles import Font, PatternFill
    import sys
    
    
    # Get input data from command-line arguments
    input_data = sys.argv[1:]  # Example: ["Value1", "Value2", "Value3"]
    # Set a PatternFill for filling a cell. This colour is lightgrey, the Hex values can be obtained from the
    # More Colors in Excel --> Format Cells --> Fill
    cellFill = PatternFill(start_color='D9D9D9',
                       end_color='D9D9D9',
                       fill_type='solid')
    
    # Ensure there are exactly three inputs
    if len(input_data) == 3:
        # Define the full path to your Excel file
        excel_file = "dummy.xlsx"
    
        # Load the workbook and worksheet
        wb = load_workbook(excel_file)
        ws = wb['Expenses']
    
        # Identify the table range
        table_name = 'tbl_expenses'
        if table_name in ws.tables:
            table = ws.tables[table_name]
            table_range = CellRange(table.ref)
    
            # Expand the table by 1 row and update the reference
            table_range.expand(down=1)  # Expands table range by 1 row down
            table.ref = table_range.coord
    
            """
            # If using Openpyxl 'append' is possible then can use the following code to insert the
            # input_data list to the next unused row on the Sheet 
            # Pad the input_data list first value for Column A so "Value1" inserts into Column B
            input_data.insert(0, None)
            # Add the list to the Sheet using .append
            ws.append(input_data)
            """
    
            # If append cannot be used or formatting of the cell is required, add the list using iter_cols
            # and set cell format at the same time
            current_row = table_range.bounds[3]
            # The input_data list length will determine the end of the loop regardless of the max column
            for input_value, cell in zip(input_data, ws.iter_cols(min_row=current_row, max_row=current_row, min_col=2)):
                cell[0].value = int(input_value)  # Insert value
                cell[0].number_format = '0'  # Set some format for the cell if desired
                cell[0].font = Font(size=12, name='arial', italic=False, bold=True, color='FF0000')
                cell[0].fill = cellFill  # cellFill is set globally earlier in the code
    
            # Extend structured reference formulas for remaining columns
            # The source can be any cell that has the same formula.
            # For this example well assume Column E has the formulas. Therefore the first cell with the formula
            # can be set as the constant source cell.
            source_cell = ws['E2']
            range_boundaries(table.ref)
            target_cell = ws[table.ref.split(':')[1]]  #
            target_cell.value = Translator(source_cell.value,
                                           origin=source_cell.coordinate).translate_formula(target_cell.coordinate)
    
            # Save the workbook
            wb.save(f"new_{excel_file}")
            print("Row added successfully with extended structured reference formulas.")
    
        else:
            print(f"Table '{table_name}' not found in worksheet.")
    
    else:
        print("Error: Please provide exactly three input values.")
    

    Before
    Example Table before adding new row
    If this table was the only Table in the Sheet, row 6 is the max used row.

    Openpyxl append if used would add the list input_data into row 7 starting from Column A.
    However if there are other tables/data on the Sheet so that row 6 is not the max used row then append cannot be used here. Of if you need to change the formatting on each cell then looping each cell to insert the value and change the formatting probably makes more sense.


    After, Input values:- 12 97 25
    Example Sheet after adding row
    The List input_data is added by whatever means and the Formula from cell E2 is copied to E7 and translated for row 7. In this example it just sums the columns B - D

    Update
    Formatting
    Regarding the formatting of a cell. The continuation of cell formatting when adding a row to an Excel Table like the formula extension is a function of the Excel Application. Openpyxl is an XML editor so it just updates the underlying XML files with the value/style/number format that you specifically set.
    Therefore, if you add data to the table by whatever means you will need to also include code to format the data as you require if the default is not what you want. Being what it is this also means that the cell formatting needs to be applied cell by cell.
    As mentioned, while using Openpyxl append is a useful quick way to write a Python List to an Excel Sheet it may not be the best option if you then need to modify the cell formatting. Given that doing this may still require looping the cells to set the formatting it may be more useful use the one loop to insert the value and apply the formatting at the same time as shown in the 2nd option to write the input_data List.
    The additional cell formatting can be added at this time for all Cell Format options including; fonts, borders and fill
    Example setting cell formats
    Note the formatting here is for the entered values only, the cells containing the formulas are written in a different section and need to be updated there.
    See updated example code for required imports and cellFill definiton.

    for input_value, cell in zip(input_data, ws.iter_cols(min_row=current_row, max_row=current_row, min_col=2)):
        cell[0].value = int(input_value)  # Insert value
        cell[0].font = Font(size=12, name='arial', italic=False, bold=True, color='FF0000')
        cell[0].fill = cellFill
        
    

    Formulas
    For your formulas, looks like you have array formulas and the translator may not help in this case.
    Generally formulas are no different to values in a cell in so far as you can write a value or a formula to a cell the same way. If the formula needs to be updated due to the row, column, Sheet etc that it exists this can be done before writing to the cell.
    An exception is array formulas that need to handled differently which appears to be what you have in this case.
    Openpyxl support these which you can read about in the [Openpyxl documentation[(https://openpyxl.readthedocs.io/en/stable/simple_formulae.html#id1). You can also check this [Question/Answer](https://stackoverflow.com/a/79217431/13664137) which covers a similar scenario.