Search code examples
google-bigquerygoogle-cloud-storageexternal-tables

How to create an external table for a google cloud storage and query the externally partitioned data?


I am trying to query the externally partitioned data with the reference of this BigQuery doc

Google Cloud Storage: (csv data contains string value only)

gs://project/myfolder/count=1000/file_1k.csv
gs://project/myfolder/count=10000/file_10k.csv
gs://project/myfolder/count=100000/file_100k.csv

Source URI prefix: gs://project/myfolder

But I am getting the following error while querying the table,

Error while reading table: project.dataset.partition_table, 
error message: Cannot query hive partitioned data for table project.dataset.partition_table without any associated files.

Query:

SELECT * FROM `project.dataset.partition_table` where count=1000 order by rand() LIMIT 100;

Refer the screenshot, enter image description here

Any inputs here really appreciated.


Solution

  • The problem is that the engine can't find the files related to the partitions. In your case, its because when you created the table you referenced a folder in GCS but not it's files.

    To solve your problem, you should use a wildcard and your path would be gs://project/myfolder/* instead of gs://project/myfolder

    I hope it helps