Search code examples
mavenjargoogle-cloud-platformdataflow

Google Dataflow jar file packaging issue with FileSystemRefistrar


I was testing my dataflow java application in my IntelliJ and it work perfectly fine. But when ran the dataflow jar file in linux system, there was this problem: dataflow error log

this is the options that I used for dataflow: --project=myproject --stagingLocation=gs://mybucket/staging2 --tempLocation=gs://mybucket/gcp-temp2 --gcpTempLocation=gs://mybucket/gcp-temp2 --bigtableProjectId=myinstance --bigtableInstanceId=user-test --bigtableTableId=test_table1

So the problem is that the gs directory is not recognized properly. In fact, it is considered as local directory of the server where I ran my jar file.

Here is the reason why this directory problem occurs:

jar file using assembly

jar file using shade

I looked for the difference between [maven assembly jar] vs [maven shade jar] and found out that FileSystemRefistrar was pointing at the wrong file.

But using shade plugin is not the remedy for the problem, I was just lucky that the class was GcsFileSystemRegistrar was not overwritten. The same problem occurs again when I change the dependency order.

To make this work, I have to have both of these libraries in this order:

beam-runners-google-cloud-dataflow-java

beam-sdks-java-core

'beam-sdks-java-core' is included in 'beam-runners-google-cloud-dataflow-java' but I need to add it after 'beam-runners-google-cloud-dataflow-java'. So the dependencyHierarchy looks funny but this is the only way I can get this to work. Here is how it looks:

pom dependency

If I exclude 'beam-sdks-java-core' or change the order, the problem occurs again. I tried excluding it using maven plugins but it didn't work.

So my question is how can I set the FileSystemRegistrar properly? I don't know why it works this way.

+And I hope if anyone whose having this problem may get a hint from this article. I struggled a lot from this :'(


Solution

  • As OGCheeze commented, it was solved by using use maven shade plugin with ServicesResourceTransformer. In this post has more detailed explanation.