Search code examples
pythongoogle-cloud-platformgoogle-cloud-vertex-ai

Read vertex ai datasets in jupyter notebook


I am trying to create a python utility that will take dataset from vertex ai datasets and will generate statistics for that dataset. But I am unable to check the dataset using jupyter notebook. Is there any way out for this?


Solution

  • If I understand correctly, you want to use Vertex AI dataset inside Jupyter Notebook. I don't think that this is currently possible. You are able to export Vertex AI datasets to Google Cloud Storage in JSONL format:

    Your dataset will be exported as a list of text items in JSONL format. Each row contains a Cloud Storage path, any label(s) assigned to that item, and a flag that indicates whether that item is in the training, validation, or test set.

    At this moment, you can use BigQuery data inside Notebook using %%bigquery like it's mentioned in Visualizing BigQuery data in a Jupyter notebook. or use csv_read() from machine directory or GCS like it's showed in the How to read csv file in Google Cloud Platform jupyter notebook thread.

    However, you can fill a Feature Request in Google Issue Tracker to add the possibility to use VertexAI dataset directly in the Jupyter Notebook which will be considered by the Google Vertex AI Team.