Search code examples
pythonjsonallennlp

Read JSON file correctly


I am trying to read a JSON file (BioRelEx dataset: https://github.com/YerevaNN/BioRelEx/releases/tag/1.0alpha7) in Python. The JSON file is a list of objects, one per sentence. This is how I try to do it:

 def _read(self, file_path):
        with open(cached_path(file_path), "r") as data_file:
            for line in data_file.readlines():
                if not line:
                    continue
                 items = json.loads(lines)
                 text = items["text"]
                 label = items.get("label")

My code is failing on items = json.loads(line). It looks like the data is not formatted as the code expects it to be, but how can I change it?

Thanks in advance for your time!

Best,

Julia


Solution

  • Your code is reading one line at a time and parsing each line individually as JSON. Unless the creator of the file created the file in this format (which given it has a .json extension is unlikely) then that won't work, as JSON does not use line breaks to indicate end of an object.

    Load the whole file content as JSON instead, then process the resulting items in the array.

    def _read(self, file_path):
        with open(cached_path(file_path), "r") as data_file:
            data = json.load(data_file)
        for item in data:
            text = item["text"]
    

    label appears to be buried in item["interaction"]