I am looking to create .txt files from a dictionary, extracting text into new lines of each txt file - dictionary structure looks like:
{'id': 0,
'text': 'Mtendere Village was inspired by the vision'}
I am using this code:
from tqdm.auto import tqdm #loading bar
text_data = []
file_count = 0
for sample in tqdm(new_dict):
# remove newline characters from each sample as we need to use exclusively as seperators
sample = sample['text'].replace('\n', '\s')
text_data.append(sample)
if len(text_data) == 5_000:
# once we hit the 5K mark, save to file
with open('file_path\oscar_data\oscar_%s.txt' %file_count, 'w', encoding='utf-8') as fp:
fp.write('\n'.join(text_data))
text_data = []
file_count += 1
However this gives me an error;
---> 12 sample = sample['text'].replace('\n', '\s')
TypeError: 'int' object is not subscriptable
Although I understand what the error is telling me, I'm not sure how to correct it...
I think you're trying to pass a list of dictionaries to the loop, but actually passed a dictionary.
from tqdm.auto import tqdm #loading bar
new_dict = [
{
'id': 0,
'text': 'Mtendere Village was inspired by the vision'
}
]
text_data = []
file_count = 0
for sample in tqdm(new_dict):
# remove newline characters from each sample as we need to use exclusively as seperators
sample = sample['text'].replace('\n', '\s')
text_data.append(sample)
if len(text_data) == 5000:
# Once we hit the 5K mark, save it to file
with open('file_path\oscar_data\oscar_%s.txt' %file_count, 'w', encoding='utf-8') as fp:
fp.write('\n'.join(text_data))
text_data = []
file_count += 1
I have updated new_dict
to a list of dictionaries and it fixed the issue.