Search code examples
pythonregexhtmltext

Couldn't find the right Regex code to extract the exact numbers


I have extracted an string about 64 bit steam ID's and friendlist using web scraping. I want to get the unique steamid's so that I can store them on a different file. I used regex, but I think I have a mistake in the the notation part.

This is the string.

{"friendslist":{"friends":[{"steamid":"7656xxxxxxx80x76","relationship":"friend","friend_since":1552765824},{"steamid":"76561xxxxxxx4xx89","relationship":"friend","friend_since":1508594830},{"steamid":"765xxxxxxxxxxx3194","relationship":"friend","friend_since":1543773569}]}}

I used regex as this:

import re
re.findall("[^:[0-9]+[0-9]+", soup.text)

However, I got this result:

['"7656xxxxxxx80x76',
'"76561xxxxxxx4xx89',
'"765xxxxxxxxxxx3194']

How am I going to get rid of the ditto marks (") at the beginning of the numbers?


Solution

  • You have JSON string so use module json

    import json
    
    text = '{"friendslist":{"friends":[{"steamid":"7656xxxxxxx80x76","relationship":"friend","friend_since":1552765824},{"steamid":"76561xxxxxxx4xx89","relationship":"friend","friend_since":1508594830},{"steamid":"765xxxxxxxxxxx3194","relationship":"friend","friend_since":1543773569}]}}'
    
    data = json.loads(text)
    
    for friend in data["friendslist"]['friends']:
        print(friend['steamid'])
    

    Result:

    7656xxxxxxx80x76
    76561xxxxxxx4xx89
    765xxxxxxxxxxx3194