Search code examples
google-data-catalog

Data Catalog will not read technical metadata automatically from files in a bucket (GCS)


In Google Data Catalog I have tried to create a new entry group, then created a fileset entry where bucket and file pattern is configured. I have not defined a schema on purpose in the Fileset, as I want data catalog to automatically find the technical metadata within/from the files. Everything is set up via Google console UI.

Data Catalog does not find metadata related to the files in the bucket. However if I make a BigQuery table og a Pub/Sub topic, the metadata from these resources shows up immediately.

My hope was that Data Catalog will be able to scan the files in our buckets and show the metadata automatically (searchable). Files in the buckets are either .avro, .json, .parquet or .csv. As mentioned, this works for BigQuery and Pub/Sub. My understanding from the docs is that this should also work for objects in Cloud Storage.

Has anybody tried this and could please shed some light on this matter?

Thanks.


Solution

  • Unfortunately, Data Catalog does not detect internal metadata about contents of GCS filesets at the moment.