Search code examples
pythonjsoncsvkeyerror

How to resolve Python KeyError elegantly (Python csv library)


I have written a basic web scraper in Python using the lxml and JSON libraries. The below code snippet details how I currently write to CSV:

with open(filepath, "ab") as f:

                write = csv.writer(f) 

                try:
                    write.writerow(["allhomes",
                                    statenum,
                                    statesubnum,
                                    suburbnum,
                                    listingnum,
                                    listingsurlstr,
                                    '',  # fill this in! should be 'description'
                                    node["state"],
                                    node["suburb"],
                                    node["postcode"],
                                    node["propertyType"],
                                    node["bathrooms"],
                                    node["bedrooms"],
                                    node["parking"],
                                    pricenode,
                                    node["photoCount"],
                                    node2["pricemin"],
                                    node2["pricemax"],
                                    node2["pricerange"]])
                except KeyError, e:
                    try:
                        write.writerow(["allhomes",
                                        statenum,
                                        statesubnum,
                                        suburbnum,
                                        listingnum,
                                        listingsurlstr,
                                        '',  # fill this in! should be 'description'
                                        node["state"],
                                        node["suburb"],
                                        node["postcode"],
                                        node["propertyType"],
                                        '',
                                        node["bedrooms"],
                                        node["parking"],
                                        pricenode,
                                        node["photoCount"],
                                        node2["pricemin"],
                                        node2["pricemax"],
                                        node2["pricerange"]])
                    except KeyError, e:
                            errorcount += 1
                            with open(filepath, "ab"):  #
                                write = csv.writer(f)
                                write.writerow(["Error: invalid dictionary field key: %s" % e.args,
                                                statenum,
                                                statesubnum,
                                                suburbnum,
                                                listingnum,
                                                listingsurlstr])
                    pass
                pass

The problem is such that if a certain node does not exist (most commonly the Bathrooms node) I have to try again by replacing the Bathrooms node with a blank value, or subsequently give up the entire row of data. My current approach is to try again and write the row by removing the Bathrooms node, but this is messy (and does not fix KeyErrors with other nodes).

How can I pass over writing a single node in this situation if it does not exist or does not contain any data, without sacrificing the whole entry?

Many thanks.


Solution

  • If you have to use keys like this, one way I've used in the past with web scraping was to create a wrapper that handled the errors, then returned the value.

    def get_node(name, node):
        try:
            val = node[name]
        except KeyError:
            val = 'na'
        return val
    
    write.writerow(['allhomes',
                    get_node('bathrooms', node),
                    ...
                   ])