Search code examples
pythongoogle-bigqueryjobs

how to create a job to launch python script on google-BigQuery


I have a problem on google-BigQuery:

  1. I wrote a query.
  2. I open the shell and I create a Python script that call the query and save the data in CSV file in a subdir of my home on BigQuery, and so far so good.

Now, all the files (.py and .csv) are that directory of BigQuery:
stefano_giostra@[MY_PROJECT]:~/bb_stkovf_data$ ls -l total 24 -rw-r--r-- 1 stefano_giostra stefano_giostra 2724 Aug 7 01:51 best_answer_on_google-bigquery_90gg_20180807_0151.log -rw-r--r-- 1 stefano_giostra stefano_giostra 669 Aug 7 01:51 domande_risposte_2014_2015.log -rw-r--r-- 1 stefano_giostra stefano_giostra 155 Aug 7 01:51 python_developer_by_nation_20180807_0151.log -rw-r--r-- 1 stefano_giostra stefano_giostra 770 Aug 7 01:51 query.sql -rw-r--r-- 1 stefano_giostra stefano_giostra 7744 Aug 7 01:50 sg_bb_stkorw_info.py

  1. I wish to see the CSV on the BigQuery or GCP UI, how can I?
  2. I wish also generate a job that simply calls the script, and I don't wish to build ad HTML application only to schedule a call of a query.

In the documentation I haven't seen any yaml file where there is a call of a .py but only of a .app and in all documentation that I'd read in these days I haven't seen how to do that.

Can someone help me?

Thanks


Solution

  • Find the answers below:

    1. To visualize the CSV in BigQuery you have to first load it in a table as described here. Then you can visualize it as a regular table (using preview or querying for particular fields).

    2. I found a way from JMD, who is a Technical Solutions Representative in Google, AKA trustworthy source, in this other post. The solution uses a Cron job in a Compute Engine f1-micro instance, which allows you to run the script without modifying it and having to build an 'HTML' application, which you don't want:

    To schedule a script you first have to create a project if you don’t have one. I assume you already have a project so if that’s the case just create the instance that you want for scheduling this script.

    To create the new instance:

    1. At the Google Cloud Platform Console click on Products & Services which is the icon with the four bars at the top left hand corner.

    2. On the menu go to the Compute section and hover on Compute Engine and then click on VM Instances.

    3. Go to the menu bar above the instance section and there you will see a Create Instance button. Click it and fill in the configuration values that you want your new instance to have. The values that you select will determine your VM instance features. You can choose, among other values, the name, zone and machine type for your new instance.

    4. In the Machine type section click the drop-down menu tab to select an “f1-micro instance”.

    5. In the Identity and API access section, give access scope to the Storage API so that you can read and write to your bucket in case you need to do so; the default access scope only allows you to read. Also enable BigQuery API.

    6. Once you have the instance created and access to the bucket, just create your cron job inside your new instance: In the user account under which the cron job will execute, run crontab -e and edit this file to run the cron job that will execute your baby.sh script.The following documentation link should help you with this.

    Please note, if you want to view output from your script you may need to redirect it to your current terminal.

    You might need to add more permissions since you are also using Google BigQuery.