Search code examples
pythonpandasjson-normalize

Convert Nested DateTime json to Pandas DataFrame


The problem here is different than those defined in the questions here, here, and here. Specifically, the transformation and form of the output I want is different than any specified in those questions, and I also want a DateTime index. These differences cause the answers on those pages fail.

I have data formatted as a json like this:

{
    "Meta Data": {
        "1. Information": "Daily Prices (open, high, low, close) and Volumes",
        "2. Symbol": "ABC",
    },
    "Time Series (Daily)": {
        "2001-06-31": {
            "1. open": "113.2000",
            "4. close": "113.8000",
        },
        "2001-07-01": {
            "1. open": "114.2000",
            "4. close": "114.2000",
        }
    }
}

I want the output to look be a pandas dataframe like this:

"Time Series (Daily)" | "1. open" | "4. close"
"2001-06-31"          | 113.2000  | 113.8000
"2001-07-01"          | 114.2000  | 114.2000

I wrote a function that works, but the for loop leaves performance wanting and I find it hard to read.

def convert_json_to_dataframe(all_json_data):
    json_data = all_json_data["Time Series (Daily)"]
    dates = []
    open = []
    close = []
    for key in json_data.keys():
        dates.append(key)
        open.append(json_data[key]["1. open"])
        close.append(json_data[key]["4. close"])
    df = pd.DataFrame(
        list(zip(open, close)),
        columns=["1. open", "4. close"],
        index=dates,
    )
    df = df.apply(pd.to_numeric, errors="ignore")
    return df

There's got to be a simpler, easier to read, higher-performing way to do this, maybe with json_normalize in pandas, but I can't figure it out.


UPDATE AFTER ANSWERS & RESOLUTION. All I had to do was:

df = pd.DataFrame(json_data["Time Series (Daily)"]).T

Pandas discovered the index and column names automatically, so I didn't need the reset_index portion of the answers.

The orient approach also worked:

df = pd.DataFrame.from_dict(json_data["Time Series (Daily)", orient="index")

To get all the numbers as floats instead of strings, I did need the apply line:

df = df.apply(pd.to_numeric, errors="coerce")

Thank you everyone.


Solution

  • Why don't you just do this?

    pd.DataFrame(data["Time Series (Daily)"]).T.reset_index().rename(columns = {"index":"Time Series (Daily)"})
    

    Output -

    Time Series (Daily) 1. open 4. close
    0 2001-06-31 113.2000 113.8000
    1 2001-07-01 114.2000 114.2000