I am trying to upload a csv file from Google Cloud Storage (GCS) to BigQuery (BQ) and auto-detect schema.
What I tried to do is enable auto-detect schema and enter the number of rows to skip in "Header rows to skip" option. I have 6 rows which contain descriptive information about the data which I need to skip. The 7th row is my actual header row.
According to Google's documentation in: https://cloud.google.com/bigquery/docs/schema-detect#auto-detect:
"The field types are based on the rows having the most fields. Therefore, auto-detection should work as expected as long as there is at least one row of data that has values in every column/field."
The problem with my CSV is that the above condition is not met in the sense that I have nulls in the rows.
Also, my CSV contains many rows which do not include any numerical values which I think adds an extra complexity for Google's schema auto detection.
The auto detect is not detecting the correct column names nor the correct field types. All field types are being detected as strings and column names assigned as such: string_field_0 , string_field_1, string_field_3 ,...etc. It is also passing the column names of my CSV as a row of data.
I would like to know what I can do to correctly upload this CSV to BQ with skipping the leading unwanted rows and having the correct schema (field names and field types).
Solved this by including my actual header row in the csv in the number of rows to skip.
I had 6 rows I actually needed to skip. The 7th row was my header (column names). I was entering 6 in the Header rows to skip.
When I entered 7 instead of 6, the schema was auto detected correctly.
Also, I realized that in this sentence in Google's documentation: "The field types are based on the rows having the most fields. Therefore, auto-detection should work as expected as long as there is at least one row of data that has values in every column/field.", nulls are considered values and so that was not actually causing a problem in the upload to BQ.
Hope this helps someone facing the same issue!