Search code examples
pythongoogle-bigquerygoogle-cloud-datalab

WHERE statement in google cloud datalab failing when using a list.


I'm using google cloud datalab and I'm trying to select data from a table based on the match in a list.

First I use a python cell to define my list

import gcp.bigquery as bq
samples = ['TCGA-CH-5751-01A', 'TCGA-EJ-5496-01A']

Then I create a cell with my sql query

%%sql --module test
SELECT 
    ParticipantBarcode, 
    SampleBarcode, 
FROM 
    [isb-cgc:tcga_201510_alpha.mRNA_UNC_HiSeq_RSEM]
WHERE SampleBarcode IN $samples
LIMIT 100

Then I would call this using another python cell

results = bq.Query(test, samples=samples).results().to_dataframe()

This fails because the WHERE statement is incorrect.

invalidQuery: Encountered " "IN" "IN ""

If I hardcode the names I want to match again in to the sql statment it works.

%%sql --module test2
SELECT 
    ParticipantBarcode, 
    SampleBarcode, 
FROM 
    [isb-cgc:tcga_201510_alpha.mRNA_UNC_HiSeq_RSEM]
WHERE SampleBarcode IN  ('TCGA-CH-5751-01A', 'TCGA-EJ-5496-01A')
LIMIT 100

I think this is because of how I'm passing in my list to sql, but I'm not sure how to properly do this on cloud data lab. Most of the python results I found when searching use python to craft the whole sql command, and I just want to add in the list.

Thanks.


Solution

  • This feature was added in the release on Feb 8 2016.