I have a dataframe:
data = {
"txt": ['12 MG/0.6 ML 8 MG/0.4 ML 12.5 MG 25 MG 39.5% 20% 550 MG 75 MG 800 MG 9 MG 75 MG/0.5 ML 0.03% 0.10% 1% 2% 0.50%', '12 MG/0.6 ML ]
}
I want to do a couple things here:
data = {
"txt": ['12 MG/0.6 ML 8 MG/0.4 ML 12.5 MG 25 MG 39.5 % 20 % 550 MG 75 MG 800 MG 9 MG 75 MG/0.5 ML 0.03 % 0.1 % 1 % 2 % 0.5 %', '12 MG/0.6 ML ]
}
You can do this in two steps:
import re
s = "12 MG/0.6 ML 8 MG/0.4 ML 12.5 MG 25 MG 39.5% 20% 550 MG 75 MG 800 MG 9 MG 75 MG/0.5 ML 0.03% 0.1% 1% 2%"
out = list(map(float, re.findall(r"(\d*[,.]?\d*)%", s)))
print(out)
Prints:
[39.5, 20.0, 0.03, 0.1, 1.0, 2.0]
Then remove trailing zeros:
out = list(map("{:g} %".format, out))
print(out)
Prints:
['39.5 %', '20 %', '0.03 %', '0.1 %', '1 %', '2 %']
EDIT: With updated input:
import re
data = {
"txt": [
"12 MG/0.6 ML 8 MG/0.4 ML 12.5 MG 25 MG 39.5% 20% 550 MG 75 MG 800 MG 9 MG 75 MG/0.5 ML 0.03% 0.10% 1% 2% 0.50%",
"12 MG/0.6 ML",
]
}
out = []
for line in data["txt"]:
out.append(re.sub(r"(\d*[,.]?\d*)%", lambda g: f"{float(g.group(1)):g} %", line))
data["txt"] = out
print(data)
Prints:
{
"txt": [
"12 MG/0.6 ML 8 MG/0.4 ML 12.5 MG 25 MG 39.5 % 20 % 550 MG 75 MG 800 MG 9 MG 75 MG/0.5 ML 0.03 % 0.1 % 1 % 2 % 0.5 %",
"12 MG/0.6 ML",
]
}