I have a set of document ID
s (keys.csv) that I am using to get a set of text documents from a document source. I would like to collect all these text documents into a corpus for further analysis (like cosine similarity).
I am using the below code to append each text document into the corpus, but I'm not sure if this is going to work. Is there a better way to create a corpus with these text documents?
keys = pandas.read_csv(keys.csv)
for i in keys:
ID = i
doc = function_to_get_document(ID)
corpus = corpus.append(doc)
If csv
has column IDcol
with unique ID
use list comprehension
, output is list
:
corpus = [function_to_get_document(ID) for ID in pd.read_csv('keys.csv')['IDcol']]
Sample:
print (pd.read_csv('keys.csv'))
IDcol
0 1
1 2
2 3
def function_to_get_document(x):
return x + 1
corpus = [function_to_get_document(ID) for ID in pd.read_csv('keys.csv')['IDcol']]
print (corpus)
[2, 3, 4]