Search code examples
pythonregexpython-re

How to extract numbers attached to a set of characters in Python


Suppose that you have a string with a lot of numbers that are attached o very close to some characters like this:

string = "I have a cellphone with 4GB of ram and 64 GB of rom, My last computer had 4GB of ram and NASA only had 4KB when ... that's incredible"

and I wanted it to return:

[4GB, 64GB, 4GB, 4KB]

I'm trying

import re
def extract_gb(string):
    gb = re.findall('[0-9]+',string)
    return gb

extract_gb(string)

output [4, 64, 4, 4]

gives just the number as output, but it would like to get the number and the set of strings attached or close of it, I expect the output [4GB, 64GB, 4GB, 4KB]

I appreciate any kind of help.


Solution

  • With a small change to the regular expression proposed by @9769953 and a subsequent substitution of unwanted whitespace we can get the exact output required as follows:

    import re
    from functools import partial
    
    string = "I have a cellphone with 4GB of ram and 64  GB of rom, My last computer had 4GB of ram and NASA only had 4KB when ... that's incredible"
    
    p = re.compile(r'\b[0-9]+\s*[A-Za-z]+\b')
    
    pf = partial(re.sub, r'\s', '')
    
    print(list(map(pf, p.findall(string))))
    

    Output:

    ['4GB', '64GB', '4GB', '4KB']
    

    Note:

    The subtle change to the regular expression allows for multiple (or none) whitespace between a sequence of digits and the following sequence of letters