Search code examples
javascriptpythonjsonjsonpathjson-path-expression

data co-occurrence in JSON dataset


I'm having trouble extracting JSON information. My JSON file contains 100 chapters of a novel. Each chapter contains a number of characters found in the chapter.

For instance:

{"ONE": ["PERSON A", "PERSON B", "PERSON C", "PERSON D", "PERSON A"],
"TWO": ["PERSON A", "PERSON D", "PERSON F", "PERSON G", "PERSON H"],
"THREE": ["PERSON F", "PERSON D", "PERSON A", "PERSON A", "PERSON A"]
... "ONE HUNDRED": ["PERSON B", "PERSON A"]
}

My goal is to design a method to extract how many times two characters co-occurred in the whole book, and two characters can only co-occur once in a chapter. eg, within 100 chapters, I want to know how many times PERSON A and PERSON B co-occurred.

I have two methods in mind, A. Use JSON PATH and filter out the dataset (where PERSON A and B co-occurred), and calculate the number of chapters they co-occurred. (I don't know what to query either :P ) B. Although I'm not really good with JAVASCRIPT. My idea is to define an integer, and then run for loops in every chapter of the JSON file.

I wonder if you guys could share your knowledge with me on this! Thanks!


Solution

  • Here's a function where you can specify if you want the count or an array of chapters

    Here is the function broken down

    const cooccur = (people, rettype) => {
      let result = Object.keys(
      // the final result will be an array of object keys
         Object.fromEntries(Object.entries(chapters)
         // but to iterate your object, we need to first convert it into an array with Object.entries
         // then with that result, convert it back into an object with Object.fromEntries
            .filter(c => people.filter(r => c[1].indexOf(r) > -1).length === people.length)));
             // this double filter will run through each chapter and filter it based on the second filter's result
             // the second filter takes our people array and finds how many total occurences of both people in a given chapter
             // if the total number of occurences equals the number of people we're searching for, it's a match
      return rettype === 'count' ? result.length : result;
    }
    

    let chapters = {
      "ONE": ["PERSON A", "PERSON B", "PERSON C", "PERSON D", "PERSON A"],
      "TWO": ["PERSON A", "PERSON D", "PERSON F", "PERSON G", "PERSON H"],
      "THREE": ["PERSON F", "PERSON D", "PERSON A", "PERSON A", "PERSON A"],
      "ONE HUNDRED": ["PERSON B", "PERSON A"]
    }
    
    const cooccur = (people, rettype) => {
      let result = Object.keys(Object.fromEntries(Object.entries(chapters).filter(c => people.filter(r => c[1].indexOf(r) > -1).length === people.length)));
      return rettype === 'count' ? result.length : result;
    }
    
    console.log('number of occurences:', cooccur(["PERSON A", "PERSON B"], 'count'));
    console.log('occurence chapters:', cooccur(["PERSON A", "PERSON B"], 'chapters'));