Suppose that you have a string with a lot of numbers that are attached o very close to some characters like this:
string = "I have a cellphone with 4GB of ram and 64 GB of rom, My last computer had 4GB of ram and NASA only had 4KB when ... that's incredible"
and I wanted it to return:
[4GB, 64GB, 4GB, 4KB]
I'm trying
import re
def extract_gb(string):
gb = re.findall('[0-9]+',string)
return gb
extract_gb(string)
output [4, 64, 4, 4]
gives just the number as output, but it would like to get the number and the set of strings attached or close of it, I expect the output [4GB, 64GB, 4GB, 4KB]
I appreciate any kind of help.
With a small change to the regular expression proposed by @9769953 and a subsequent substitution of unwanted whitespace we can get the exact output required as follows:
import re
from functools import partial
string = "I have a cellphone with 4GB of ram and 64 GB of rom, My last computer had 4GB of ram and NASA only had 4KB when ... that's incredible"
p = re.compile(r'\b[0-9]+\s*[A-Za-z]+\b')
pf = partial(re.sub, r'\s', '')
print(list(map(pf, p.findall(string))))
Output:
['4GB', '64GB', '4GB', '4KB']
Note:
The subtle change to the regular expression allows for multiple (or none) whitespace between a sequence of digits and the following sequence of letters