Search code examples
pythonpandasstringcolorspython-re

How to extract specific text and some extra characters from a string in python?


Let's say I have a text:

"background-color:  #ffffe5;\n            color:  #000000;\n        }{\n            background-color:  #ed7215;\n            color: 
#000000; background-color:  #662506;\n            color:  #f1f1f1;"

I would like to extract some dictionaries that contain:

string1 = {background-color:  #ffffe5}
string2 = {background-color:  #ed7215}
string3 = {background-color:  #662506}

The text would normally be longer therefore having several more background colors, is there a way to get all of this. I know this should be done with re but I am not sure how to do this with this " ".join(string.split()) I know I could use this to remove the unncesary white spaces that would simplify the Issue but still I don´t have any ideas.Any help would be great. or a list with the colors in order would be cool but only the background colors.


Solution

  • Use re.findall:

    import re
    
    text = '''"background-color:  #ffffe5;\n            color:  #000000;\n        }{\n            background-color:  #ed7215;\n            color: #000000; background-color:  #662506;\n            color:  #f1f1f1;"'''
    
    out1 = re.findall('background-color:\s*[^;]+', text)
    
    # OR
    
    out2 = re.findall('background-color:\s*([^;]+)', text)
    

    Output:

    >>> out1
    ['background-color:  #ffffe5',
     'background-color:  #ed7215',
     'background-color:  #662506']
    
    >>> out2
    ['#ffffe5', '#ed7215', '#662506']