Search code examples
powershellgoogle-cloud-platformdataflow

Dataflow command doesn't excute


I'm trying to create a dataflow pipeline but I get this error in my powershell console

Traceback (most recent call last):
File "load_pole_emploi.py", line 107, in <module>
    run()
  File "load_pole_emploi.py", line 92, in run
    gcs_bucket_name + file_pattern)
  File "E:\Utilisateurs\SI084342\venv\gcp\lib\site-packages\apache_beam\io\textio.py", line 675, in __init__
    escapechar=escapechar)
  File "E:\Utilisateurs\SI084342\venv\gcp\lib\site-packages\apache_beam\io\textio.py", line 132, in __init__
    validate=validate)
  File "E:\Utilisateurs\SI084342\venv\gcp\lib\site-packages\apache_beam\io\filebasedsource.py", line 124, in __init__
    self._validate()
  File "E:\Utilisateurs\SI084342\venv\gcp\lib\site-packages\apache_beam\options\value_provider.py", line 193, in _f
    return fnc(self, *args, **kwargs)
  File "E:\Utilisateurs\SI084342\venv\gcp\lib\site-packages\apache_beam\io\filebasedsource.py", line 185, in _validate
    match_result = FileSystems.match([pattern], limits=[1])[0]
  File "E:\Utilisateurs\SI084342\venv\gcp\lib\site-packages\apache_beam\io\filesystems.py", line 203, in match
    filesystem = FileSystems.get_filesystem(patterns[0])
  File "E:\Utilisateurs\SI084342\venv\gcp\lib\site-packages\apache_beam\io\filesystems.py", line 106, in get_filesystem
    'e.g., pip install apache-beam[gcp]. Path specified: %s' % path)
ValueError: Unable to get filesystem from specified path, please use the correct path or ensure the required dependency is installed, e.g., pip install apache-beam[gcp]. Path specified: gs://bck-fr-fichiers-manuel-dev/de_par_categorie_et_code_rome/file.csv

I have re installed apache beam[gcp] but the problem still remains

Any help, thanks


Solution

  • thank you, it was related to missing apache beam test and docs