docs = ['Consumer discretionary, healthcare and technology are preferred China equity sectors.', 'Consumer discretionary remains attractive, supported by China’s policy to revitalize domestic consumption. Prospects of further monetary and fiscal stimulus should reinforce the Chinese consumption theme.', 'The healthcare sector should be a key beneficiary of the coronavirus outbreak, on the back of increased demand for healthcare services and drugs.', 'The technology sector should benefit from increased demand for cloud services and hardware demand as China continues to recover from the coronavirus outbreak.', 'China consumer discretionary sector is preferred. In our assessment, the sector is likely to outperform the MSCI China Index in the coming 6-12 months.']
model = Top2Vec(docs, embedding_model = 'universal-sentence-encoder')
while running the above command, I'm getting an error that is not clearly visible for debugging what could be the root cause for the error?
Error:
ValueError Traceback (most recent call last) in () ----> 1 model = Top2Vec(docs, embedding_model = 'universal-sentence-encoder')
2 frames <array_function internals> in vstack(*args, **kwargs)
/usr/local/lib/python3.6/dist-packages/numpy/core/shape_base.py in vstack(tup) 281 if not isinstance(arrs, list): 282 arrs = [arrs] --> 283 return _nx.concatenate(arrs, 0) 284 285
<array_function internals> in concatenate(*args, **kwargs)
ValueError: need at least one array to concatenate
You need to use more docs and unique words for it to find at least 2 topics. As an example, I just multiply your list by 10 and it works:
from top2vec import Top2Vec
docs = ['Consumer discretionary, healthcare and technology are preferred China equity sectors.',
'Consumer discretionary remains attractive, supported by China’s policy to revitalize domestic consumption. Prospects of further monetary and fiscal stimulus should reinforce the Chinese consumption theme.',
'The healthcare sector should be a key beneficiary of the coronavirus outbreak, on the back of increased demand for healthcare services and drugs.',
'The technology sector should benefit from increased demand for cloud services and hardware demand as China continues to recover from the coronavirus outbreak.',
'China consumer discretionary sector is preferred. In our assessment, the sector is likely to outperform the MSCI China Index in the coming 6-12 months.']
docs = docs*10
model = Top2Vec(docs, embedding_model='universal-sentence-encoder')
print(model)
<top2vec.Top2Vec.Top2Vec object at 0x13eef6210>
I had few (30) long docs of up to 130 000 characters, so I just split them into smaller docs every 5000 characters:
docs_split = []
for doc in docs:
skip_n = 5000
for i in range(0,130000,skip_n):
docs_split.append(doc[i:i+skip_n])