I am trying to convert my training data for spaCy train using spaCy convert. My data looks (after some preprocessing with pandas) like this:
1 Hii hii PRON _ NounClass=9|Num=Sing _ _ _ _
2 si si VERB _ _ _ _ _ _
3 mara mara NOUN _ NounClass=10|Num=Plur _ _ _ _
4 ya_kwanza ya_kwanza NUM _ _ _ _ _ _
5 kwa kwa ADP _ _ _ _ _ _
6 uongozi uongozi NOUN _ NounClass=11|Num=Sing _ _ _ _
I used the following command in the Terminal:
PS C:\Users\...\pythonProject1> python -m spacy convert C:\Users\...\pythonProject1\my_dataframe_ready.conllu C:\Users\...\pythonProject1\train
and get the following output:
ℹ Grouping every 1 sentences into a document.
⚠ To generate better training data, you may want to group sentences into
documents with `-n 10`.
Traceback (most recent call last):
File "C:\Users\...\miniconda3\envs\pythonProject1\lib\runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "C:\Users\...\miniconda3\envs\pythonProject1\lib\runpy.py", line 85, in _run_code
exec(code, run_globals)
File "C:\Users\...\miniconda3\envs\pythonProject1\lib\site-packages\spacy\__main__.py", line 4, in <module>
setup_cli()
File "C:\Users\...\miniconda3\envs\pythonProject1\lib\site-packages\spacy\cli\_util.py", line 71, in setup_cli
command(prog_name=COMMAND)
File "C:\Users\...\miniconda3\envs\pythonProject1\lib\site-packages\click\core.py", line 829, in __call__
return self.main(*args, **kwargs)
File "C:\Users\...\miniconda3\envs\pythonProject1\lib\site-packages\click\core.py", line 782, in main
rv = self.invoke(ctx)
File "C:\Users\...\miniconda3\envs\pythonProject1\lib\site-packages\click\core.py", line 1259, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "C:\Users\...\miniconda3\envs\pythonProject1\lib\site-packages\click\core.py", line 1066, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "C:\Users\...\miniconda3\envs\pythonProject1\lib\site-packages\click\core.py", line 610, in invoke
return callback(*args, **kwargs)
File "C:\Users\...\miniconda3\envs\pythonProject1\lib\site-packages\typer\main.py", line 500, in wrapper
return callback(**use_params) # type: ignore
File "C:\Users\...\miniconda3\envs\pythonProject1\lib\site-packages\spacy\cli\convert.py", line 89, in convert_cli
msg=msg,
File "C:\Users\...\miniconda3\envs\pythonProject1\lib\site-packages\spacy\cli\convert.py", line 140, in convert
db = DocBin(docs=docs, store_user_data=True)
File "C:\Users\...\miniconda3\envs\pythonProject1\lib\site-packages\spacy\tokens\_serialize.py", line 86, in __init__
for doc in docs:
File "C:\Users\...\miniconda3\envs\pythonProject1\lib\site-packages\spacy\training\converters\conllu_to_docs.py", line 38, i
n conllu_to_docs
for sent_doc in sent_docs:
File "C:\Users\...\miniconda3\envs\pythonProject1\lib\site-packages\spacy\training\converters\conllu_to_docs.py", line 85, i
n read_conllx
ner_map=ner_map,
File "C:\Users\...\miniconda3\envs\pythonProject1\lib\site-packages\spacy\training\converters\conllu_to_docs.py", line 209,
in conllu_sentence_to_doc
heads=heads,
File "spacy\tokens\doc.pyx", line 366, in spacy.tokens.doc.Doc.__init__
File "spacy\morphology.pyx", line 49, in spacy.morphology.Morphology.add
File "spacy\morphology.pyx", line 153, in spacy.morphology.Morphology.feats_to_dict
ValueError: need more than 1 value to unpack
Is there something still wrong with my Data? I actually have no idea what this Error should be telling me.
Based on the line your error is occurring on, it looks like you have a malformed feature list somewhere. A feature list looks like alpha=yes|beta=no
. It seems like you might have something that looks like alpha=yes|beta
, which is invalid.
I think the underscore by itself is a special case and should be valid, but maybe you have some other kind of filler?
You can debug this by modifying the conllu_sentence_to_doc
function in conllu_to_docs.py
to print the morphs
value before calling doc = Doc(...)
.