Search code examples
sqlgoogle-bigquerydata-analysis

Having Issues Uploading Table to BigQuery


I am attempting to upload a .csv files to BigQuery to use for a project in the Google Data Analysis Certificate program, but am continually running into this error:

Unexpected error
Tracking number: c1642301846583819

Here is the table creation menu that with the fields I have filled out (linked as I'm a new user)

I am aware of a bug currently that is producing a similar error message but the table is still created after refreshing the page, however this is not working with this file. After refreshing the page, the table is still not available, so it must not have been created. This file is larger than previous files that had worked that way as well. I have also tried uploading the file to google cloud and uploading into BigQuery from there, and it returns this error message:

Failed to create table: Error while reading data, error message: Could not parse '4/12/2016 7:21:00 AM' as TIMESTAMP for field Time (position 1) starting at location 15 with message 'Invalid time zone: AM'

When I go into Google Cloud Logging, the following code is present for this error :

{
  "protoPayload": {
    "@type": "type.googleapis.com/google.cloud.audit.AuditLog",
    "status": {
      "code": 3,
      "message": "Error while reading data, error message: Could not parse '4/12/2016 7:21:00 AM' as TIMESTAMP for field Time (position 1) starting at location 15  with message 'Invalid time zone: AM'"
    },
    "authenticationInfo": {
      "principalEmail": "REDACTED"
    },
    "requestMetadata": {
      "callerIp": "2600:8801:21c:3f00:306d:ff1a:ed67:fb92",
      "callerSuppliedUserAgent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.4951.67 Safari/537.36,gzip(gfe),gzip(gfe)"
    },
    "serviceName": "bigquery.googleapis.com",
    "methodName": "google.cloud.bigquery.v2.JobService.InsertJob",
    "authorizationInfo": [
      {
        "resource": "projects/analytics-capstone-project",
        "permission": "bigquery.jobs.create",
        "granted": true
      }
    ],
    "resourceName": "projects/analytics-capstone-project/jobs/bquxjob_42de9715_180df3c3ba2",
    "metadata": {
      "jobChange": {
        "after": "DONE",
        "job": {
          "jobName": "projects/analytics-capstone-project/jobs/bquxjob_42de9715_180df3c3ba2",
          "jobConfig": {
            "type": "IMPORT",
            "loadConfig": {
              "sourceUris": [
                "gs://exampleforml/heartrate_seconds_merged.csv"
              ],
              "schemaJson": "{\n}",
              "destinationTable": "projects/analytics-capstone-project/datasets/FitBit_Fitness_Tracker_Data/tables/hr_seconds",
              "createDisposition": "CREATE_IF_NEEDED",
              "writeDisposition": "WRITE_EMPTY"
            }
          },
          "jobStatus": {
            "jobState": "DONE",
            "errorResult": {
              "code": 3,
              "message": "Error while reading data, error message: Could not parse '4/12/2016 7:21:00 AM' as TIMESTAMP for field Time (position 1) starting at location 15  with message 'Invalid time zone: AM'"
            },
            "errors": [
              {
                "code": 3,
                "message": "Error while reading data, error message: Could not parse '4/12/2016 7:21:00 AM' as TIMESTAMP for field Time (position 1) starting at location 15  with message 'Invalid time zone: AM'"
              },
              {
                "code": 3,
                "message": "Error while reading data, error message: CSV processing encountered too many errors, giving up. Rows: 1; errors: 1; max bad: 0; error percent: 0"
              },
              {
                "code": 3,
                "message": "Could not parse '4/12/2016 7:21:00 AM' as TIMESTAMP for field Time (position 1) starting at location 15  with message 'Invalid time zone: AM'"
              }
            ]
          },
          "jobStats": {
            "createTime": "2022-05-20T02:11:47.618Z",
            "startTime": "2022-05-20T02:11:47.759Z",
            "endTime": "2022-05-20T02:11:48.821Z",
            "loadStats": {},
            "totalSlotMs": "138"
          }
        }
      },
      "@type": "type.googleapis.com/google.cloud.audit.BigQueryAuditMetadata"
    }
  },
  "insertId": "-fsqzz0e2lplj",
  "resource": {
    "type": "bigquery_project",
    "labels": {
      "project_id": "analytics-capstone-project",
      "location": "US"
    }
  },
  "timestamp": "2022-05-20T02:11:48.832533Z",
  "severity": "ERROR",
  "logName": "projects/analytics-capstone-project/logs/cloudaudit.googleapis.com%2Fdata_access",
  "operation": {
    "id": "1653012707618-analytics-capstone-project:bquxjob_42de9715_180df3c3ba2",
    "producer": "bigquery.googleapis.com",
    "last": true
  },
  "receiveTimestamp": "2022-05-20T02:11:49.112933352Z"
}

The original file I am trying to upload is found in this kaggle dataset and is called heartrate_seconds_merged

I am unsure how to resolve this and would appreciate any help! Thank you!

EDIT: I have tried to continue to upload more of the data for this project and now am unable to create any more tables. Is there an issue with how I am attempting to create them?


Solution

  • Google is quite picky around how their various date and time fields are formatted during ingestion. They mention this all too briefly in their documentation here.

    The error, per your logs, is

    "Could not parse '4/12/2016 7:21:00 AM' as TIMESTAMP".

    Per Google's documentation:

    • The date portion can be formatted as YYYY-MM-DD or YYYY/MM/DD
    • The timestamp portion must be formatted as HH:MM[:SS[.SSSSSS]] (seconds and fractions of seconds are optional).
    • The date and time must be separated by a space or 'T'.
    • Optionally, the date and time can be followed by a UTC offset or the UTC zone designator (Z) and you can learn more about timezones here.

    Where does that leave you then? Unfortunately, the Datetime type also requires the similar YYYY-[M]M-[D]D[( |T)[H]H:[M]M:[S]S[.F]] format. Assuming you want to stick with the Timestamp type of your schema and that your timestamp is presented in UTC, it means you're going to have to preprocess each of your files to convert your value from 4/12/2016 7:21:00 AM to 2016/04/12 07:21:00 (note that the leading zero for each segment is not optional, though you could optionally drop the seconds in this case).