Search code examples
pythonjsonpandasnestednormalize

Using JSON with pandas, problem with nested info (python)


I am trying to use a JSON file to create a table with pandas.

import seaborn as sns
import pandas as pd
from pandas.io.json import json_normalize


releves = pd.read_json('DataTP2.json')
releves

My file is structured the following way:

[
  {
    "trimestre":"H2012",
    "cours":[
      {
        "sigle":"TECH 20701",
        "titre":"La cybersécurité et le gestionnaire",
        "etudiants":[
          {
            "matricule":"22003545",
            "nom":"Lahaie,Olivier",
            "note":"A+",
            "valeur": 4.3
          },

and so on.

When using read_json, the table does not show the info nested and instead shows every less indented items as one line as such:

|Cours|Trimestre|

My desired output would be:

|etudiant|nom|matricule|note|valeur|sigle|titre|trimestre|

I have tried using normalize_json, but I get the following error:

AttributeError: 'str' object has no attribute 'itervalues'

I have tried to convert to a dictionary before using normalize, but another error pops up. Can anyone help me get out of this roadblock?

Thank you


Solution

  • Hey this should flatten your json

    json_normalize(arr, record_path=['cours', 'etudiants'], 
               meta=['trimestre', ['cours', 'sigle'], ['cours', 'titre']], 
               record_prefix='etudiant_')