Search code examples
pythonpandasstringdataframesubstring

How to extract text from every row in a dictionary like column in dataframe?


I have been trying this for way to long and can't seem to figure out a concise way to extract the browser from the string. It is a column in a df so it needs to iterate over all the rows

The column looks like this

0        [{'name': 'Chrome', 'version': '36.0.1985.143'}]
1        [{'name': 'Chrome', 'version': '34.0.1847.137'}]
2         [{'name': 'Chrome', 'version': '29.0.1547.76'}]
3        [{'name': 'Chrome', 'version': '33.0.1750.154'}]
4        [{'name': 'Chrome', 'version': '36.0.1985.143'}]

The column is called browser.

I have tried the following.

df_agent_info['browser'].str.split("\[\{\'[a\-z]\'")

and other worse examples. I appreciate the help.


Solution

  • import re
    
    pattern = r"(?<='name': ')[\w ]+"
    
    def match(x):
        if re.findall(pattern, x):
            return re.findall(pattern, x)[0]
    
    df['browser'].apply(match)
    

    (?<='name': ') is a positive lookbehind: it looks for matches that follow in this case 'name': '