Search code examples
pythonregexlistsplitpython-re

Converting formula variables to variable names with regex operations


I ma trying to convert the variable Formula_bit into variable like names where they are lowercase and words are seperated by _. My Process is as follows splitting the right-hand side by operators (+, -, *, /) or x (multiplication), converts the resulting items to lowercase, replaces spaces with underscores, removes opening and closing parentheses. Finally removing the leading and trailing underscores if there are any. However my output and expected outputs dont match what could I do to fix this?

import re 
Formula_bit = ['Σ (Dividends)', 'Dividend Payout Ratio * eps']

# Process the right-hand side of each formula to extract parameters
params = [
    re.split(r'\s*[+\-*/]\s*| x ', re.sub(r'[+\-*/]', ',', item))[0]  # Split the right-hand side by operators (+, -, *, /) or 'x' (multiplication)
        .lower()  # Convert to lowercase
        .replace(" ", "_")  # Replace spaces with underscores
        .replace("(", "")  # Remove opening parentheses
        .replace(")", "")  # Remove closing parentheses 
    for item in Formula_bit
]

# Remove leading and trailing underscores from each item and strip whitespace
params = [item.lstrip('_').rstrip('_').strip() for item in params]

Output:

['σ_dividends', 'dividend_payout_ratio_,_eps']

Expected output:

['σ_dividends', 'dividend_payout_ratio', 'eps']

Solution

  • Example, that converts formula to variable names

    import re
    import string
    
    Formula_bit = ['Σ (Dividends)', 'Dividend Payout Ratio * eps']  # Input formulas
    
    splitter = "_"  # Splitter character for replacing spaces
    formula = ",".join(Formula_bit)  # Join the formulas into a single string
    formula = re.sub(r"[()]", "", formula.lower())  # Remove parentheses from the formula string
    formula = re.sub(r"\s", splitter, formula)  # Replace whitespace characters with the splitter
    punctuation = string.punctuation.replace(splitter, "")  # Punctuation excluding the splitter
    formula = re.sub(fr"[{punctuation}]", ",", formula)  # Remove punctuation characters from the formula strin
    params = [s.strip(splitter) for s in formula.split(",")]  # Split the formula string on commas to extract the parameters and strip splitter characters
    print(params)
    # ['σ_dividends', 'dividend_payout_ratio', 'eps']
    

    Here one check is missed. To be a valid variable name, first character should be a letter (not digit).