I've been attempting to stream data from the below api and finding very little success.
https://dev.socrata.com/foundry/data.cityofchicago.org/8v9j-bter
main.py script
#install main packages
!pip install sodapy
import pandas as pd
from sodapy import Socrata
from google.datalab import Context
#put into dataframe
client = Socrata("data.cityofchicago.org", None)
results = client.get("8v9j-bter", limit=2000)
results_df = pd.DataFrame.from_records(results)
#flow into BigQuery
results_df.to_gbq('chicago_traffic.demo_data', Context.default().project_id,
chunksize=2000, verbose=True, if_exists='append')
App.yaml script
runtime: python27
api_version: 1
threadsafe: true
handlers:
- url: /.*
script: main.app
cron.yaml script
cron:
- description: "append traffic data"
url: /.*
target: main
schedule: every 1 mins
retry_parameters:
min_backoff_seconds: 2.5
max_doublings: 5
requirements.txt
pandas==0.22.0
sodapy==1.4.6
datalab==1.1.2
google-api-python-client
With the app.yaml you are using you would be deploying in App Engine Standard Enviroment. When you use the Standard Enviroment you can use any of these built-in third party libraries by adding them to your app.yaml file or you can use any other third-party library by following the steps in this link.
The problem here is that as stated in the previous link I shared:
You can use third-party libraries that are pure Python code with no C extensions,
The pandas library has parts of its code written in C (https://pandas.pydata.org/#library-highlights How to solve import error for pandas? ), therefore in order to use Pandas you would need to use the App Engine Flexible Environment.
You’d need to do some modifications in your files in order for your deploy to run. Follow these links to adapt your files to the flexible environment:
!pip install sodapy makes use of the magic command !, which is something exclusive from some Notebooks environments. You cannot add that line to your main.py file. This operation (pip install sodapy) will be run during the Deployment of the App because you are adding sodapy==1.4.6 to your requirements.txt file.
Instead of adding the Context.default().project_id, you should be able to just add your project_id as a str. When running inside the App Engine Flex authorization shouldn't be an issue. If you want to run it in local remember to use a service account with the right permissions.