Search code examples
python-3.xdataframelambda

Formatting dataframe column values


I have a dataframe:

data = {
  "txt": ['12 MG/0.6 ML  8 MG/0.4 ML  12.5 MG  25 MG  39.5%  20%  550 MG  75 MG  800 MG  9 MG  75 MG/0.5 ML  0.03%  0.10%  1%  2% 0.50%', '12 MG/0.6 ML ]
}

I want to do a couple things here:

  1. Add space between the decimal number and % sign. for example 39.5% should become 39.5 %. This is easy with a simple replace.
  2. If the decimal number is something like 0.10% then I need it to become 0.1 % i.e. the second decimal place gone if it is a zero in that place and then add the space ( the space part is easy replace after 2 is done). 3.I need to apply this update in the dataframe column df['txt'] itself, meaning the expected output is:
data = {
  "txt": ['12 MG/0.6 ML  8 MG/0.4 ML  12.5 MG  25 MG  39.5 %  20 %  550 MG  75 MG  800 MG  9 MG  75 MG/0.5 ML  0.03 %  0.1 %  1 %  2 % 0.5 %', '12 MG/0.6 ML ]
}

Solution

  • You can do this in two steps:

    • first find the numbers with percentages and convert it to float
    • use these floats and do a string formatting where you remove trailing zeros
    import re
    
    s = "12 MG/0.6 ML  8 MG/0.4 ML  12.5 MG  25 MG  39.5%  20%  550 MG  75 MG  800 MG  9 MG  75 MG/0.5 ML  0.03%  0.1%  1%  2%"
    
    out = list(map(float, re.findall(r"(\d*[,.]?\d*)%", s)))
    print(out)
    

    Prints:

    [39.5, 20.0, 0.03, 0.1, 1.0, 2.0]
    

    Then remove trailing zeros:

    out = list(map("{:g} %".format, out))
    print(out)
    

    Prints:

    ['39.5 %', '20 %', '0.03 %', '0.1 %', '1 %', '2 %']
    

    EDIT: With updated input:

    import re
    
    data = {
        "txt": [
            "12 MG/0.6 ML  8 MG/0.4 ML  12.5 MG  25 MG  39.5%  20%  550 MG  75 MG  800 MG  9 MG  75 MG/0.5 ML  0.03%  0.10%  1%  2% 0.50%",
            "12 MG/0.6 ML",
        ]
    }
    
    out = []
    for line in data["txt"]:
        out.append(re.sub(r"(\d*[,.]?\d*)%", lambda g: f"{float(g.group(1)):g} %", line))
    
    data["txt"] = out
    print(data)
    

    Prints:

    {
        "txt": [
            "12 MG/0.6 ML  8 MG/0.4 ML  12.5 MG  25 MG  39.5 %  20 %  550 MG  75 MG  800 MG  9 MG  75 MG/0.5 ML  0.03 %  0.1 %  1 %  2 % 0.5 %",
            "12 MG/0.6 ML",
        ]
    }