Below is the for loop that loop the all the word document files. As you can see below, I have already printed the filename to see the output of it.
for filename in os.listdir(root_dir):
source_directory = root_dir + '/' + filename
# The output of filename is shown in the next section.
-> print(filename)
arr = mynotes_extractor.get_mynotes(source_directory)
list2str = str(arr)
c = cleanString(newstring=list2str)
new_arr = []
new_arr += [c]
text_file = open(output, 'a', encoding='utf-8')
for item in new_arr:
text_file.write("%s\n" % item)
The below is the output after printing filename:
12345_Cat_A_My Notes.docx
6789_Cat_B_My Notes.docx
54321_Cat_A_My Notes.docx
12234_Cat_C_My Notes.docx
86075_Cat_D_My Notes.docx
34324_Cat_E_My Notes.docx
I would like to extract only the specific name, which is "My Notes" in all the filenames of word document inside the for loop as shown above.
For instance:
Before filename of word document extraction: 34324_Cat_E_My Notes.docx
After filename of word document extraction: My Notes
Written in one line tidiness but can be confusing when you are starting out.
filename.split('.')[0].split('_')[-1]
output: 'My Notes'
Detailed explanation below:
filename = '12345_Cat_A_My Notes.docx'
.split('.')
splits the string at every period
>>>['12345_Cat_A_My Notes', 'docx']
[0]
takes the first element of the list
>>>'12345_Cat_A_My Notes'
.split('_')
splits this string at each underscore returning
>>>['12345', 'Cat', 'A', 'My Notes']
[-1]
Finally, takes the last item in the list with returning
>>>'My Notes'