Search code examples
google-cloud-platformgoogle-cloud-dataprep

I am facing an issue in Google Cloud's Dataprep (Perform Foundational Data, ML, and AI Tasks in Google Cloud: Challenge Lab)


I am doing the challenge lab titled "Perform Foundational Data, ML, and AI Tasks in Google Cloud: Challenge Lab". However, in task 3 ("Task 3: Run a simple Dataprep job"), I am encountering an issue that is preventing me from completing the course.

I have searched online sources to see if someone has already solved the problem, but I haven't found anything. I watched videos of people completing this lab on YouTube one month ago. I believe that there may have been some changes to Dataprep, Google Cloud, or the lab settings in this one month that are now resulting in this error. I followed the same steps as the person in the video.

I am attaching the image and error logs below. The error is in the execution of the workload with the "Data flow - Run job on Dataflow" configuration. When I try to reproduce it through the "Trifacta Photon - Run job on Trifacta Photon (best for small and medium-sized jobs, up to approximately 1 GB of data)" option, I am able to execute it without any errors. However, the Qwik Labs test does not validate this configuration.

Screenshot 1 showing the problem.

Screenshot 2 showing the problem.

Screenshot 3 showing the problem.

Dataflow job execution failed: S01:PTableStoreTransformGCS3/TextIO.Write/WriteFiles/GatherTempFileResults/Reify.ReifyViewInGlobalWindow/Create.Values/Read(CreateSource)/Impulse+PTableStoreTransformGCS3/TextIO.Write/WriteFiles/GatherTempFileResults/Reify.ReifyViewInGlobalWindow/Create.Values/Read(CreateSource)/ParDo(OutputSingleSource)/ParMultiDo(OutputSingleSource)+PTableStoreTransformGCS3-TextIO-Write-WriteFiles-GatherTempFileResults-Reify-ReifyViewInGlobalWindow3/PairWithRestriction+PTableStoreTransformGCS3-TextIO-Write-WriteFiles-GatherTempFileResults-Reify-ReifyViewInGlobalWindow3/SplitWithSizing failed., The job failed because a work item has failed 4 times. Look in previous log entries for the cause of each one of the 4 failures. If the logs only contain generic timeout errors related to accessing external resources, such as MongoDB, verify that the worker service account has permission to access the resource's subnetwork. For more information, see https://cloud.google.com/dataflow/docs/guides/common-errors. The work item was attempted on these workers: Open logs

The issue is only with "Task 3"; I am able to complete the other tasks without any problems.

I have searched in many places and have not found any solution. I have seen videos from one month ago on YouTube where this issue did not appear. I believe that some changes or updates may have been made that have affected this Google Cloud lab.


Solution

  • Update 04/08/2023:

    The lab is back online as of now. It looks like they have addressed the issue for Task 3.


    As of 04/03/2023 There is a bug causing this issue. Here is the response I got from Qwiklabs:

    ********* ******** (Qwiklabs Support) Apr 3, 2023, 07:14 GMT+5:30

    Hi *****, Greetings from Qwiklabs! Sorry for the inconvenience caused. There is an ongoing issue and we have taken note of this. It's our top priority to fix this, our team is already working on this and we will fix it at the earliest. We will keep you updated with the progress. Once again I sincerely apologize for the inconvenience caused. We will put our best efforts to fix this and get you up and running at the earliest.

    See you in the Cloud, ******* from Qwiklabs

    Just to confirm with you, this issue is specific to task 3 for Perform Foundational Data, ML, and AI Tasks in Google Cloud: Challenge Lab. As soon as I get a response that it is fixed, I will update my answer here.