Search code examples
pythonpandasgithub-api

Pandas iterate over rows and append API response as a new column


Thanks for taking the time to view this,

I want to iterate over a dataframe by rows and use cell values to pull data from an api and assign the response to a new column. This is what I tried :

for index, row in d.iterrows():
    response=requests.get('https://api.github.com/repos/{}/{}/issues/{}'.format(row['ownr'],row['repo'],row['issues']),headers=headers)
    row['issueData']=response.json()

but after running this piece of code df does not seem to have an issueData column. Where am I going wrong?

I am using Github's endpoint to pull the data and have validate that there is a response.

Here is a sample of the df :

df = pd.DataFrame({'ownr':['vuejs', 'vuejs', 'vuejs'],
        'repo':['vue', 'vue', 'vue-cli'],
        'issues': ['12040','11794','6448']})

Solution

  • Using apply() would be a better option.

    def fetch(row):
      response=requests.get(f"https://api.github.com/repos/{row['ownr']}/{row['repo']}/issues/{row['issues']}",headers=headers)
      return response.json()
         
    d['issueData'] = d.apply(lambda row: fetch(row), axis=1)
    

    In your original example you would need to set the column to be an empty string first then perform the operation of fetching the data.

    d['issueData'] = ""
    
    for index, row in d.iterrows():
        response=requests.get('https://api.github.com/repos/{}/{}/issues/{}'.format(row['ownr'],row['repo'],row['issues']),headers=headers)
        row['issueData']=response.json()