Search code examples
pythongoogle-cloud-automlautomlgoogle-cloud-automl-nl

Python script for creating JSONL training files for AutoML Natural Language


i am trying to create the JSONL training files for AutoML Natural Language and it say in the docs

To help you create JSONL training files, AutoML Natural Language offers a Python script that converts plain text files into appropriately formatted JSONL files. See the comments in the script for details.

i tried to follow the comments but i didn't get them i tried runing it with this

python jason.py C:\..dic.csv C:\..text.txt gs://mybucket

but it gives me :

(with 5 blank lines skipped)
Traceback (most recent call last):
  File "jason.py", line 688, in <module>
    main()
  File "jason.py", line 680, in main
    UploadFiles(annotated_files, FLAGS.target_gcs_directory)
  File "jason.py", line 636, in UploadFiles
    f.write(csv_line)
TypeError: write() argument must be str, not bytes

can anyone help me with an example of how to run the script please


Solution

  • The tool provided in created using python2. You can run python2 jsonl_converter.py -s sample_1.txt gs://your-bucket so that you won't be editing the code provided. Or if you'd like you can follow @Justin Ezequiel suggestion if you need to run it in python3. I just used the -s option to auto split long files.

    Test using python 2: enter image description here

    JSONL in designated GCS bucket: enter image description here