The way i'm formatting is like:
Jersei N
atinge V
média N
. PU
Programe V
...
First string in each line is the lexical item, the other is a pos tag. But the empty-line (that i'm using to indicate the end of a sentence) gives me the error AttributeError: 'Example' object has no attribute 'text'
when running the given code:
src = data.Field()
trg = data.Field(sequential=False)
mt_train = datasets.TabularDataset(
path='/path/to/file.tsv',
fields=(src, trg))
src.build_vocab(train)
How the proper way to indicate EOS to torchtext?
The following code reads the TSV the way i formatted:
mt_train = datasets.SequenceTaggingDataset(path='/path/to/file.tsv',
fields=(('text', text),
('labels', labels)))
It happens that SequenceTaggingDataset
properly identifies an empty line as the sentence separator.