I have list of values as string "index:count" I want to extract the index and count in the string as in the below code:
string="358:6 1260:2 1533:7 1548:292 1550:48 1561:3 1564:186"
values=[v for v in re.findall('.+?:.+?.', string)]
for g in values:
index=g[:g.index(":")]
count=g[g.index(":")+1:]
print(int(index)+" "+str(count))
But I got error message
ValueError: invalid literal for int() with base 10: '2 1550'
it seems I wrote the regular expression operations wrongly. any idea how to fix this?
I think you won't need the ?
lazy modifier at the end of the regex pattern. The ?
lazy modifier you put there can actually produce more noise than capturing the right data
EDIT NOTE: the pattern .+:.+
I introduced in previous edits was a wrong or even a bad regex pattern to capture the desired pattern. Please use the \d+:\d+
pattern instead. However, I leave it be because it still can solve the OP's problem using another workaround.
As long as your data is not malformed or contain noise and is neatly separated with a whitespace, I think '.+:.+'
is sufficient to find your index:count
format. Probably the best way is to use \d+:\d+
since you know it is at least one digit
separated by a :
and followed by another digit
.
Here are good links regexr and regex101 to better design/visualize your regex pattern.
If you use the .+:.+
pattern, it will return you the string as a whole since it matches the string as a whole. You need to preprocess the result since re.findall
returns a list
, in this example, it returns only 1 element.
In [ ]: string="358:6 1260:2 1533:7 1548:292 1550:48 1561:3 1564:186"
...: values=[v for v in re.findall('.+:.+', string)]
...: print(values)
['358:6 1260:2 1533:7 1548:292 1550:48 1561:3 1564:186']
Since it returns a list with only one element, you can use pop()
to take the only str
element out and print it nicely with str
function split()
.
In [ ]: print(values.pop().split())
['358:6', '1260:2', '1533:7', '1548:292', '1550:48', '1561:3', '1564:186']
If you are using \d+:\d+
pattern, it will directly return you a nicely separated list since it correctly finds them. Therefore, you can directly print its value.
In [ ]: string="358:6 1260:2 1533:7 1548:292 1550:48 1561:3 1564:186"
...: values=[v for v in re.findall('\d+:\d+', string)]
...: print(values)
['358:6', '1260:2', '1533:7', '1548:292', '1550:48', '1561:3', '1564:186']
Finally, you can print the result nicely with built-in string formatting. Disclaimer: I do not own that website, I just found it useful for beginner me :)
In [ ]: for s in values:
...: index, count = s.split(":")
...: print("Index: {:>8} Count: {:>8}".format(index, count))
...:
Index: 358 Count: 6
Index: 1260 Count: 2
Index: 1533 Count: 7
Index: 1548 Count: 292
Index: 1550 Count: 48
Index: 1561 Count: 3
Index: 1564 Count: 186