I keep multiple data with similar name in Google Cloud Storage. My data comes here daily via API and I want to add them to my table in BigQuery, which is refreshed daily, and I do this via Python. When I do WRITE_TRUNCATE, it deletes the table and creates a new table with all the files in the storage, but I need to protect my table because I have historical data in it and they are not in the storage. When I WRITE_APPEND, I have a duplicate problem because it adds all the files in the storage. Anyone here have a suggested solution?
here is a piece of my code:
job_config.write_disposition = bigquery.WriteDisposition.WRITE_TRUNCATE
job_config.skip_leading_rows = 1
# The source format defaults to CSV, so the line below is optional.
job_config.source_format = bigquery.SourceFormat.CSV
job_config.autodetect = True
job_config.max_bad_records = 5
uri = "gs://" + gcsbucket + "/" + tableprefix ```
@AhmetBuğraBUĞA, As you have mentioned in the comment, Date names were written at the end of the data and the problem can be solved by adding dates in the BigQuery as below which takes a single day from the dates.
(dt.datetime.today() - dt.timedelta(day=1)).strftime("%Y-%m-%d")