Search code examples
pythonasciipython-docx

Bold, underlining, and Iterations with python-docx


I am writing a program to take data from an ASCII file and place the data in the appropriate place in the Word document, and making only particular words bold and underlined. I am new to Python, but I have extensive experience in Matlab programming. My code is:

#IMPORT ASCII DATA AND MAKE IT USEABLE
#Alternatively Pandas - gives better table display results
import pandas as pd
data = pd.read_csv('203792_M-51_Niles_control_SD_ACSF.txt', sep=",", 
header=None)
#print data
#data[1][3]  gives value at particular data points within matrix
i=len(data[1])
print 'Number of Points imported =', i
#IMPORT WORD DOCUMENT
import docx  #Opens Python Word document tool
from docx import Document  #Invokes Document command from docx
document = Document('test_iteration.docx')  #Imports Word Document to Modify
t = len(document.paragraphs)  #gives the number of lines in document
print 'Total Number of lines =', t
#for paragraph in document.paragraphs:
   # print(para.text)  #Prints the text in the entire document
font = document.styles['Normal'].font
font.name = 'Arial'
from docx.shared import Pt
font.size = Pt(8)
#font.bold = True
#font.underline = True
for paragraph in document.paragraphs:
    if 'NORTHING:' in paragraph.text:
        #print paragraph.text
        paragraph.text = 'NORTHING: \t',  str(data[1][0])
        print paragraph.text   
    elif 'EASTING:' in paragraph.text:
        #print paragraph.text
        paragraph.text = 'EASTING: \t', str(data[2][0])
        print paragraph.text
    elif 'ELEV:' in paragraph.text:
        #print paragraph.text
        paragraph.text = 'ELEV: \t', str(data[3][0])
        print paragraph.text
    elif 'CSF:' in paragraph.text:
        #print paragraph.text
        paragraph.text = 'CSF: \t', str(data[8][0])
        print paragraph.text
    elif 'STD. DEV.:' in paragraph.text:
        #print paragraph.text
        paragraph.text = 'STD. DEV.: ', 'N: ', str(data[5][0]), '\t E: ', 
str(data[6][0]), '\t EL: ', str(data[7][0])
    print paragraph.text
#for paragraph in document.paragraphs:
   #print(paragraph.text)  #Prints the text in the entire document
#document.save('test1_save.docx') #Saves as Word Document after Modification

My question is how to make only the "NORTHING:" bold and underlined in:

    paragraph.text = 'NORTHING: \t',  str(data[1][0])
    print paragraph.text 

So I wrote a pseudo "find and replace" command that works great if all the values being replaced are the exactly same. However, I need to replace the values in the second paragraph with the values from the second array of the ASCII file, and the third paragraph with the values from the third array..etc. (I have to use find and replace because the formatting of the document is to advanced for me to replicate in a program, unless there is a program that can read the Word file and write the programming back as Python script...reverse engineer it.)

I am still just learning, so the code may seem crude to you. I am just trying to automate this boring process of copy and pasting.


Solution

  • Untested, but assuming python-docx is similar to python-pptx (it should be, it's maintained by the same developer, and a cursory review of the documentation suggests that the way it interfaces withthe PPT/DOC files is the same, uses the same methods, etc.)

    In order to manipulate substrings of paragraphs or words, you need to use the run object:

    https://python-docx.readthedocs.io/en/latest/api/text.html#run-objects

    In practice, this looks something like:

    for paragraph in document.paragraphs:
        if 'NORTHING:' in paragraph.text:
            paragraph.clear()
            run = paragraph.add_run()
            run.text = 'NORTHING: \t'
            run.font.bold = True
            run.font.underline = True
            run = paragraph.add_run()
            run.text = str(data[1][0])    
    

    Conceptually, you create a run instance for each part of the paragraph/text that you need to manipulate. So, first we create a run with the bolded font, then we add another run (which I think will not be bold/underline, but if it is just set those to False).

    Note: it's preferable to put all of your import statements at the top of a module.

    This can be optimized a bit by using a mapping object like a dictionary, which you can use to associate the matching values ("NORTHING") as keys and the remainder of the paragraph text as values. ALSO UNTESTED

    import pandas as pd
    from docx import Document  
    from docx.shared import Pt
    
    data = pd.read_csv('203792_M-51_Niles_control_SD_ACSF.txt', sep=",", 
    header=None)
    i=len(data[1])
    print 'Number of Points imported =', i
    document = Document('test_iteration.docx')  #Imports Word Document to Modify
    t = len(document.paragraphs)  #gives the number of lines in document
    print 'Total Number of lines =', t
    font = document.styles['Normal'].font
    font.name = 'Arial'
    font.size = Pt(8)
    
    # This maps the matching strings to the data array values
    data_dict = {
        'NORTHING:': data[1][0],
        'EASTING:': data[2][0],
        'ELEV:': data[3][0],
        'CSF:': data[8][0],
        'STD. DEV.:': 'N: {0}\t E: {1}\t EL: {2}'.format(data[5][0], data[6][0], data[7][0])
        }
    
    for paragraph in document.paragraphs:
        for k,v in data_dict.items():
            if k in paragraph.text:
                paragraph.clear()
                run = paragraph.add_run()
                run.text = k + '\t'
                run.font.bold = True
                run.font.underline = True
                run = paragraph.add_run()
                run.text = '{0}'.format(v)