Search code examples
pythonsqlgoogle-bigqueryblockchainbitcoin

Running parameterized queries


Quite new to this google bigquery sql thing so please bear with me. I'm trying to build a google standardSQL parameterized query. The following sample was used and ran successfully on Google BigQuery WebUI.

#standardSQL
    WITH time AS 
        (
            SELECT TIMESTAMP_MILLIS(timestamp) AS trans_time,
                inputs.input_pubkey_base58 AS input_key,
                outputs.output_pubkey_base58 AS output_key,
                outputs.output_satoshis AS satoshis,
                transaction_id AS trans_id
            FROM `bigquery-public-data.bitcoin_blockchain.transactions`
                JOIN UNNEST (inputs) AS inputs
                JOIN UNNEST (outputs) AS outputs
                WHERE inputs.input_pubkey_base58 = '1XPTgDRhN8RFnzniWCddobD9iKZatrvH4'
                OR outputs.output_pubkey_base58 = '1XPTgDRhN8RFnzniWCddobD9iKZatrvH4'
        )
    SELECT input_key, output_key, satoshis, trans_id,
        EXTRACT(DATE FROM trans_time) AS date
    FROM time
      WHERE trans_time >= '2010-05-21' AND trans_time <= '2010-05-23' AND satoshis >= 1000000000000
    --ORDER BY date

Sample extracted from here as a side note.

This gives 131 rows:

Table sample

What I would like to be able to do, is to use the ScalarQueryParameter, so I could programatically use some vars along the way. Like this:

myquery = """
#standardSQL
    WITH time AS 
        (
            SELECT TIMESTAMP_MILLIS(timestamp) AS trans_time,
                inputs.input_pubkey_base58 AS input_key,
                outputs.output_pubkey_base58 AS output_key,
                outputs.output_satoshis AS satoshis,
                transaction_id AS trans_id
            FROM `bigquery-public-data.bitcoin_blockchain.transactions`
                JOIN UNNEST (inputs) AS inputs
                JOIN UNNEST (outputs) AS outputs
                WHERE inputs.input_pubkey_base58 = @pubkey
                OR outputs.output_pubkey_base58 = @pubkey
        )
    SELECT input_key, output_key, satoshis, trans_id,
        EXTRACT(DATE FROM trans_time) AS date
    FROM time
      WHERE trans_time >= @mdate AND trans_time <= @tdate AND satoshis >= 1000000000000
    --ORDER BY date
"""

varInitDate = '2010-05-21'
varEndDate = '2010-05-23'
pubkey = '1XPTgDRhN8RFnzniWCddobD9iKZatrvH4'

query_params = [
    bigquery.ScalarQueryParameter('mdate', 'STRING', varInitDate),
    bigquery.ScalarQueryParameter('tdate', 'STRING', varEndDate),
    bigquery.ScalarQueryParameter('pubkey', 'STRING', pubkey)
]

job_config = bigquery.QueryJobConfig()
job_config.query_parameters = query_params
query_job = client.query(myquery,job_config=job_config)

Nevertheless, i'm facing the following error:

<google.cloud.bigquery.table.RowIterator object at 0x7fa098be85f8>
Traceback...
TypeError: 'RowIterator' object is not callable

Can someone pls enlighten me on how can i achieve the mentioned purpose ?

P.S - '1XPTgDRhN8RFnzniWCddobD9iKZatrvH4' is the Laszlo’s Pizza 10.000 bitcoin exchange (1000000000000 satoshis).


Solution

  • So ... the problem was with this line of code that didn't work as expected. Not sure why though, as it worked with queries that didn't have parameterized vars.

    results = query_job.result()
    df = results().to_dataframe()
    

    And the actual code... Remember to replace with your own login credentials for this to work.

    import datetime, time
    from google.cloud import bigquery
    from google.oauth2 import service_account
    import pandas as pd
    
    #login
    credentials = service_account.Credentials.from_service_account_file('your.json')
    project_id = 'your-named-project'
    client = bigquery.Client(credentials= credentials,project=project_id)
    
    #The query
    q_input = """
    #standardSQL
            WITH time AS 
                (
                    SELECT TIMESTAMP_MILLIS(timestamp) AS trans_time,
                        inputs.input_pubkey_base58 AS input_key,
                        outputs.output_pubkey_base58 AS output_key,
                        outputs.output_satoshis AS satoshis,
                        transaction_id AS trans_id
                    FROM `bigquery-public-data.bitcoin_blockchain.transactions`
                        JOIN UNNEST (inputs) AS inputs
                        JOIN UNNEST (outputs) AS outputs
                        WHERE inputs.input_pubkey_base58 = @pubkey
                        OR outputs.output_pubkey_base58 = @pubkey
                )
            SELECT input_key, output_key, satoshis, trans_id,
                EXTRACT(DATE FROM trans_time) AS date
            FROM time
              WHERE trans_time >= @mdate AND trans_time <= @tdate AND satoshis >= @satoshis
            --ORDER BY date
    """
    
    #The desired purpose
    def runQueryTransaction(varInitDate,varEndDate,pubkey,satoshis):
        global df
        query_params = [
            bigquery.ScalarQueryParameter('mdate', 'STRING', varInitDate),
            bigquery.ScalarQueryParameter('tdate', 'STRING', varEndDate),
            bigquery.ScalarQueryParameter('pubkey', 'STRING', pubkey),
            bigquery.ScalarQueryParameter('satoshis', 'INT64', satoshis),
        ]
        job_config = bigquery.QueryJobConfig()
        job_config.query_parameters = query_params
        query_job = client.query(q_input,job_config=job_config)  # API request - starts the query
        results = query_job.result()  # Waits for job to complete.
        df=pd.DataFrame(columns=['input_key', 'output_key', 'satoshis', 'trans_id', 'date'])
        for row in results:
            df.loc[len(df)] = [row.input_key, row.output_key, row.satoshis, row.trans_id, row.date]
            #print("{} : {} : {} : {} : {}".format(row.input_key, row.output_key, row.satoshis, row.trans_id, row.date))
        return df
    
    #runQueryTransaction(InitialDate,EndDate,WalletPublicKey,Satoshis)
    runQueryTransaction('2010-05-21','2010-05-23','1XPTgDRhN8RFnzniWCddobD9iKZatrvH4',1000000000000)
    

    Cheers