Search code examples
pythonjsonpython-3.xinstagram-api

Python - Problem with extract data from json


To extract data on Instagram entries, I need to download the media list and then download the data separately for each entry. I'm just doing something wrong because it only gets me data for 1 entry and not for everyone that I have to change to get it correctly.

This is the code at the moment:

import urllib.request as o
import json
import csv
from pandas.io.json import json_normalize
import pandas as pd

url = 'https://graph.facebook.com/v3.2/1234567891011/media?fields=media_type,like_count,comments_count,timestamp&limit=500&access_token=xxx'
link1 = 'https://graph.facebook.com/v3.2/'
link2 = '/insights?metric=engagement%2Cimpressions%2Creach%2Csaved&access_token=xxx'
with o.urlopen(url) as jfile :
    data1 = json.load(jfile)
    df = json_normalize(data1["data"])
    linki = []
    for dane3 in df:
        linki = link1 + df['id'] + link2
        dx = []
        with o.urlopen(linki[0]) as file2 :
            data2 = json.load(file2)
            dx = json_normalize(data2["data"],
                              record_path ='values',
                              meta =['id', 'name', 'title'])
            dx['ident'] =dx['id'][0].split("/")[0]
dn7 = dx.pivot(index='ident', columns='name', values='value')
dn7

The data I want to extract is:

ident|engagement|impressions|reach|saved
987654321|65|2142|1943|2

What do I need to improve in the code I'm using Python 3?


Solution

  • On every iteration of for dane3 in df, you are re-assigning dx according to the current json response DataFrame. This means that you are only keeping the information related to the last post processed.

    Instead, you could keep a list of the normalized JSON DataFrames and concatenate them once all of the posts have been processed.

    You are also using the same post ID on every iteration of the for loop, via df['id'] and linki[0], which means that you will only get the data for the first post. Instead, your loop should iterate over the values of the 'id' column of your DataFrame, i.e. for post_id in df['id'].

    post_data = []
    with o.urlopen(url) as jfile:
        data1 = json.load(jfile)
        df = json_normalize(data1["data"])
        for post_id in df['id']:
            linki = link1 + post_id + link2
            with o.urlopen(linki) as file2:
                data2 = json.load(file2)
                dx = json_normalize(data2["data"],
                                    record_path ='values',
                                    meta =['id', 'name', 'title'])
                dx['ident'] = dx['id'][0].split("/")[0]
                post_data.append(dx)
    dn7 = pd.concat(post_data).pivot(index='ident', columns='name', values='value')