Search code examples
jsonregexgoogle-cloud-platformgoogle-cloud-dataflow

Dataflow Template Metadata regex definition with escaped chars


I am creating a Dataflow Flex template and I would like to define the input parameters as documented here https://cloud.google.com/dataflow/docs/guides/templates/using-flex-templates#metadata

I have configured my template with the following JSON

{
  "name": "My name",
  "description": "my long description",
  "parameters": [
    {
      "name": "outputTopic",
      "label": "Pub/Sub ingest topic",
      "helpText": "Pub/Sub topic to publish result"
    },
    {
      "name": "bigQueryProject",
      "label": "BigQuery project",
      "helpText": "BigQuery google project"
    },
    {
      "name": "tempLocation",
      "label": "GCS Temp Location",
      "helpText": "GCS Location for storing temporary files",
      "regexes": [
        "gs://.*"
      ]
    },
    {
      "name": "startDate",
      "label": "Start Date",
      "isOptional": true,
      "helpText": "Start date in the format YYYY-MM-DD or use 'YESTERDAY' as default",
      "regexes": [
        "^TODAY$|^YESTERDAY$|^\\d{4}-\\d{2}-\\d{2}$"
      ]
    }
    }

It works properly as I can see all parameters properly defined if I use the Google Console

screenshot console

However, when I execute the Job with all validated parameters, the request fails for an invalid REGEX defined

"(fb57dedae5c9fead): Template metadata contains invalid POSIX regex '^TODAY$|^YESTERDAY$|^\\d{4}-\\d{2}-\\d{2}$': invalid escape sequence: \\d in \\d.

it Looks Like that the double escaping in the regex needed to create a valid JSON file, is not well interpreted by Dataflow. Thanks in advance


Solution

  • The error you are facing is due to Invalid RegEx. As I mentioned in the comment you can solve this by using \\\\, (\\\\d) .

    Based on my understanding, you need to use \\\\, this will produce \\ in the string and \ in the RegEx.You can understand this more by using the RegEx validator.