Search code examples
pythonscreen-scraping

Convert HTML line break into python dictionary


I am scraping incident data from the RFS (Rural Fire Service) website to do some data analyse, However part of the data I get back i need to break it up to be more granular.

I have collected the following straight from a JSON and would appreciate some suggestions on how i could turn this 'description': 'ALERT LEVEL: Not Applicable <br />LOCATION: Turnbulls Lane, Moruya, NSW 2537 <br />COUNCIL AREA: Eurobodalla <br />STATUS: Out of control <br />TYPE: Structure Fire <br />FIRE: Yes <br />SIZE: 0 ha <br />RESPONSIBLE AGENCY: Fire and Rescue NSW <br />UPDATED: 3 Apr 2020 21:14'

into this

{'ALERT LEVEL': 'Not Applicable', 'LOCATION': 'Turnbulls Lane, Moruya, NSW 2537', 'COUNCIL AREA': 'Eurobodalla', 'STATUS': 'Out of control', 'TYPE': 'Structure Fire', 'FIRE': 'Yes', 'SIZE': '0 ha', 'RESPONSIBLE AGENCY ': 'Fire an
d Rescue NSW', 'UPDATED': '3 Apr 2020 21:14'}

Thanks


Solution

  • Well, if the first key will not have any : in them, the following code should work

    text = """'description': 'ALERT LEVEL: Not Applicable <br />LOCATION: Turnbulls Lane, Moruya, NSW 2537 <br />COUNCIL AREA: Eurobodalla <br />STATUS: Out of control <br />TYPE: Structure Fire <br />FIRE: Yes <br />SIZE: 0 ha <br />RESPONSIBLE AGENCY: Fire and Rescue NSW <br />UPDATED: 3 Apr 2020 21:14'"""
    lines = text.split("<br />")
    data_dump = {}
    for line in lines:
        line = line.replace("'","")
        key,value = line.split(":", maxsplit=1)
    
        data_dump[key] = value
    print(data_dump)
    

    But this approach might break, if there was a : in the key of the text.