Search code examples
convertersspacy-3

Why do I get "ValueError: need more than 1 value to unpack" when using spaCy convert on my conllu data?


I am trying to convert my training data for spaCy train using spaCy convert. My data looks (after some preprocessing with pandas) like this:

1   Hii hii PRON    _   NounClass=9|Num=Sing    _   _   _   _
2   si  si  VERB    _   _   _   _   _   _
3   mara    mara    NOUN    _   NounClass=10|Num=Plur   _   _   _   _
4   ya_kwanza   ya_kwanza   NUM _   _   _   _   _   _
5   kwa kwa ADP _   _   _   _   _   _
6   uongozi uongozi NOUN    _   NounClass=11|Num=Sing   _   _   _   _

I used the following command in the Terminal:

PS C:\Users\...\pythonProject1> python -m spacy convert C:\Users\...\pythonProject1\my_dataframe_ready.conllu C:\Users\...\pythonProject1\train

and get the following output:

ℹ Grouping every 1 sentences into a document.
⚠ To generate better training data, you may want to group sentences into
documents with `-n 10`.
Traceback (most recent call last):
  File "C:\Users\...\miniconda3\envs\pythonProject1\lib\runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "C:\Users\...\miniconda3\envs\pythonProject1\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "C:\Users\...\miniconda3\envs\pythonProject1\lib\site-packages\spacy\__main__.py", line 4, in <module>
    setup_cli()
  File "C:\Users\...\miniconda3\envs\pythonProject1\lib\site-packages\spacy\cli\_util.py", line 71, in setup_cli
    command(prog_name=COMMAND)
  File "C:\Users\...\miniconda3\envs\pythonProject1\lib\site-packages\click\core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "C:\Users\...\miniconda3\envs\pythonProject1\lib\site-packages\click\core.py", line 782, in main
    rv = self.invoke(ctx)
  File "C:\Users\...\miniconda3\envs\pythonProject1\lib\site-packages\click\core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "C:\Users\...\miniconda3\envs\pythonProject1\lib\site-packages\click\core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "C:\Users\...\miniconda3\envs\pythonProject1\lib\site-packages\click\core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "C:\Users\...\miniconda3\envs\pythonProject1\lib\site-packages\typer\main.py", line 500, in wrapper
    return callback(**use_params)  # type: ignore
  File "C:\Users\...\miniconda3\envs\pythonProject1\lib\site-packages\spacy\cli\convert.py", line 89, in convert_cli
    msg=msg,
  File "C:\Users\...\miniconda3\envs\pythonProject1\lib\site-packages\spacy\cli\convert.py", line 140, in convert
    db = DocBin(docs=docs, store_user_data=True)
  File "C:\Users\...\miniconda3\envs\pythonProject1\lib\site-packages\spacy\tokens\_serialize.py", line 86, in __init__
    for doc in docs:
  File "C:\Users\...\miniconda3\envs\pythonProject1\lib\site-packages\spacy\training\converters\conllu_to_docs.py", line 38, i
n conllu_to_docs
    for sent_doc in sent_docs:
  File "C:\Users\...\miniconda3\envs\pythonProject1\lib\site-packages\spacy\training\converters\conllu_to_docs.py", line 85, i
n read_conllx
    ner_map=ner_map,
  File "C:\Users\...\miniconda3\envs\pythonProject1\lib\site-packages\spacy\training\converters\conllu_to_docs.py", line 209,
in conllu_sentence_to_doc
    heads=heads,
  File "spacy\tokens\doc.pyx", line 366, in spacy.tokens.doc.Doc.__init__
  File "spacy\morphology.pyx", line 49, in spacy.morphology.Morphology.add
  File "spacy\morphology.pyx", line 153, in spacy.morphology.Morphology.feats_to_dict
ValueError: need more than 1 value to unpack

Is there something still wrong with my Data? I actually have no idea what this Error should be telling me.


Solution

  • Based on the line your error is occurring on, it looks like you have a malformed feature list somewhere. A feature list looks like alpha=yes|beta=no. It seems like you might have something that looks like alpha=yes|beta, which is invalid.

    I think the underscore by itself is a special case and should be valid, but maybe you have some other kind of filler?

    You can debug this by modifying the conllu_sentence_to_doc function in conllu_to_docs.py to print the morphs value before calling doc = Doc(...).