Search code examples
javascriptgoogle-cloud-platformdataformsqlx

Using javascript to transform/manipulate data in google cloud dataform (sqlx)


im trying to use a javascript function to test if i can manipulate data in dataform from a query, however compilator says that "color" is not defined (one of the fields im trying to manipulate), I have not been able to find any example on how to do use the functions, only seen examples with js constants but not data manipulation, here is how I am trying (notice when im calling the function "addcolor" on the query SELECT):

config {
    type: "view",
    name: "available_bicycles_dataform"
}

SELECT
  ab.bicycle_id,
  ab.manufacturer,
  ab.color as color,
  ${addcolor(color)} AS added_color
FROM
  `learn-project.bicycles.available_bicycles` AS ab
LEFT JOIN
  `learn-project.bicycles_scheduled.scheduled_job_used_bicycles` AS abdf
ON
  ab.bicycle_id = abdf.bike_id
WHERE
  abdf.bike_id IS NULL


js {
    function addcolor(color) {
        return color + "hi";
    }
}

EDIT:

Adding an example of what I would expect to happen or trying to achieve:

Suppose I have the following data in the available_bicycles table:

bicycle_id manufacturer color
1 ACME Blue
2 ACME Red
3 ACME Green

what I would expect dataform to do when running the query for creating the view available_bicycles_dataformis to add a new column which is the one "manipulated" with the javascript function:

bicycle_id manufacturer color added_color
1 ACME Blue Bluehi
2 ACME Red Redhi
3 ACME Green Greenhi

notice the new added_color column which appended "hi" to the color


Solution

  • The way I think of Dataform and JavaScript is that the JavaScript is executed during compile time when the SQLX is compiled to pure SQL. Do not think that the JS is executed during BigQuery execution of the SQL statements. With that in mind, we look at your SELECT statement and see:

    SELECT
      ab.bicycle_id,
      ab.manufacturer,
      ab.color as color,
      ${addcolor(color)} AS added_color
    FROM ...
    

    Focusing on the line which reads:

    ${addcolor(color)} AS added_color
    

    I interpret this as ....

    Our goal is to generate SQL ... we are going to have SQL that reads:

    SELECT
      ab.bicycle_id,
      ab.manufacturer,
      ab.color as color,
    

    and now we come across ${addcolor(color)}. What we will do here is invoke a JavaScript function called addcolor and what it returns will be subsituted in place for the SQL component. So if that function returns "HelloWorld", it would be as though we coded:

    SELECT
      ab.bicycle_id,
      ab.manufacturer,
      ab.color as color,
      HelloWorld
    FROM ...
    

    Since your JS functionc called addcolor() took a parameter and the parameter value you provided was color, during compilation, the compiler tried to resolve the parameter as a variable called color but there was no such variable in scope. This explains your error message of:

    "color" is not defined

    Later ... after your edit ...

    We must now ask ourselves, what do we want our resulting SQL to look like ... the answer would then be:

    SELECT
      ab.bicycle_id,
      ab.manufacturer,
      ab.color as color,
      CONCAT(ab.color, "hi") AS added_color
    FROM ...
    

    In this case, there is no need for a JS function call.

    However, if what you need is to perform logic within a JavaScript function, you would created a named User Defined Function (UDF). The documentation for this can be found here. The documentation contains samples/examples and feels quite straightforward to digest ... however, in summary:

    CREATE TEMP FUNCTION myJsFunc(input)
    RETURNS STRING
    LANGUAGE js
    AS r"""
      return input + "hi!";
    """
    

    Your invocation logic in your SQL statement would then become:

    SELECT
      ab.bicycle_id,
      ab.manufacturer,
      ab.color as color,
      myJsFunc(ab.color) AS added_color
    FROM ...
    

    What I want you to notice is that nothing in this story requires Dataform. Rather, the definition of a UDF is itself a SQL statement and the invocation of the UDF is merely a reference to the previously defined UDF. Where Dataform provides you capability is in managing all of these SQL statements but does not provide any function during the execution of those statements when run.