Search code examples
csvgoogle-bigquerycharacterultraedit

Replace character in large txt file


I have a series of .txt files that I need to upload to Google Cloud in order to create some tables out of them in BigQuery. The files are tables whose separator is a ';'.

For some reason it seems like BigQuery has issues at identifying columns (automatically or not) when the files are in this shape, even if specifying ';' as the separator.

I generated a sample table, and find-and-replace the ';' with some ',' and saved the file as .csv. Now BigQuery has no issue at creating the apt table.

Question: Should I find-and-replace all the all the separators in all the tables? or am I missing something?

If yes how can I implement on a OS 10 the sep prompt command? (files are large and I have issues at timely substitute character also with UltraEdit)

Best!


Solution

  • To address this issue - you can upload such file(s) as having just one column and then after you have data in BigQuery - parse it using SQL into whatever schema you need. I think this is the most generic and easy way to deal with this. Usually SPLIT(col, ';') would be used as in below quite simplified example or in more complex cases REGEXP_EXTRACT_ALL()

    #standardSQL
    WITH `project.dataset.table` AS (
      SELECT 'aa,a;bbb;ccc' col
    )
    SELECT 
      SPLIT(col, ';')[SAFE_OFFSET(0)] col1, 
      SPLIT(col, ';')[SAFE_OFFSET(1)] col2, 
      SPLIT(col, ';')[SAFE_OFFSET(2)] col3
    FROM `project.dataset.table`   
    

    with result

    Row col1    col2    col3     
    1   aa,a    bbb     ccc