I am using Airflow with BigQueryToCloudStorageOperator operator to save a copy of the BQ table as a CSV file in GCP storage bucket. The source table has just one column, with the content of a few comma separated fields:a1,b1,c1,d1
. The output file is saved fine, but each row has double quotes at the beginning and at the end:"a1,b1,c1,d1"
. In the doc I don't see any parameters to control this behavior, to exclude those double quotes from the output file. Is there a way to fix this? The desired output would be just a string like a1,b1,c1,d1
, without double quotes at the beginning and at the end of each row.
There is a parameter quote_character=""
which I thought can help here, when I include it into my DAG in BigQueryToCloudStorageOperator initialization, but it doesn't do anything in my case.
I suspect you used the field_delimiter
param with ,
as value (it's the default value in the operator, if it's not specified) :
def __init__(
self,
*,
source_project_dataset_table: str,
destination_cloud_storage_uris: List[str],
project_id: Optional[str] = None,
compression: str = 'NONE',
export_format: str = 'CSV',
field_delimiter: str = ','
.....
Your single column contains also ,
characters, that's why the BigQuery
api adds a quote char between the column value.
I tried with a separator different from ,
for example with |
character and the csv
file was generated without quote char between the column value :
bq extract --location=EU \
--destination_format CSV \
--compression NONE \
--field_delimiter "|" \
--print_header=true \
"project:mazlum_test.test_list_column" \
gs://mazlum_dev/test_column_list.csv
The result csv
file with |
as separator is :
my_list
a1,b1,c1,d1
The result csv
file with ,
as separator is :
my_list
"a1,b1,c1,d1"
With Airflow
, you only have to set the field_delimiter
parameter with |
as value for example :
BigQueryToGCSOperator(
task_id='task_id',
source_project_dataset_table='table',
destination_cloud_storage_uris=[
'dest_bucket'
],
compression='NONE',
export_format='CSV',
field_delimiter="|",
print_header=True
)