I am trying to read a docx file and to add the text to a list. Now I need the list to contain lines from the docx file.
example:
docx file:
"Hello, my name is blabla,
I am 30 years old.
I have two kids."
result:
['Hello, my name is blabla', 'I am 30 years old', 'I have two kids']
I cant get it to work.
Using the docx2txt
module from here:
github link
There is only one command of process and it returns all the text from docx file.
Also I would like it to keep the special characters like ":\-\.\,"
docx2txt module reads docx file and converts it in text format.
You need to split above output using splitlines()
and store it in list.
Code (Comments inline) :
import docx2txt
text = docx2txt.process("a.docx")
#Prints output after converting
print ("After converting text is ",text)
content = []
for line in text.splitlines():
#This will ignore empty/blank lines.
if line != '':
#Append to list
content.append(line)
print (content)
Output:
C:\Users\dinesh_pundkar\Desktop>python c.py
After converting text is
Hello, my name is blabla.
I am 30 years old.
I have two kids.
List is ['Hello, my name is blabla.', 'I am 30 years old. ', 'I have two kids.']
C:\Users\dinesh_pundkar\Desktop>