Search code examples
google-cloud-platformgoogle-cloud-data-fusion

Cloud Data Fusion - trim quotes for all columns


I have a csv file in GCS with fields with hundreds of columns enclosed in quotes, like below :

"John","Doe","5/15/2021 7:18:26 PM"

I need to load this to BigQuery using Data fusion, created a pipeline. My question is

  1. How do I trim quotes from these the columns in the Wrangler? I don't find much documentation for this, rather than the basic things
  2. How do I apply this rule for all the columns in one shot.

Please guide me, any good reading on these kind of operations will also be helpful


Solution

  • For testing purposes I used your sample data and add a few more entries.

    enter image description here

    Remove quotes

    If your data looks like this and your objective is to just remove the quotes from your data, what you can do is:

    • Click the drop down arrow beside body
    • Select Find and replace
    • At find put " and leave replace as blank

    enter image description here

    Your output will look like this:

    enter image description here


    Parse CSV to split into columns

    You can then convert your CSV to columns:

    • Click the drop down beside body
    • Select Parse -> CSV
    • A pop up will appear and select "Comma"

    enter image description here

    This will tell your wrangler to read it as a CSV and split the comma to columns. But the original data will remain at column body.

    enter image description here

    To delete body:

    • Select body by ticking the check box at the right
    • Click the drop down beside body
    • Select Delete column

    enter image description here

    Your data should now look like this:

    enter image description here