Search code examples
rxmlms-wordofficer

R and Microsoft Word: Updating text in one Word document based on text in another Word document


An updated version of my question. Below is code to produce two Word documents. The first document contains a series of table titles, each with an accompanying bookmark. The second document contains an actual table.

What I'd like to be able to do is to determine what the table title in the second document should be based on what is specified in the first document. I believe the mechanics of this might involve finding the relevant bookmark in the first document, moving up a line to where the actual title is, and then copying the title, so that it can be used in the second document.

library(officer)
library(magrittr)
library(flextable)

read_docx() %>%

body_add_par(value = "Fred Table", style = "table title") %>%
body_add_par("") %>%
body_bookmark(id = "FredBMK") %>%
body_add_par("") %>%

body_add_par(value = "Sally Table", style = "table title") %>%
body_add_par("") %>%
body_bookmark(id = "SallyBMK") %>%
body_add_par("") %>%               

body_add_par(value = "George Table", style = "table title") %>%
body_add_par("") %>%
body_bookmark(id = "GeorgeBMK") %>%
body_add_par("") %>%                               

body_add_par(value = "Sample Data from the mtcars Dataset", style = "table title") %>%
body_add_par("") %>%
body_bookmark(id = "mtcarsBMK") %>%
body_add_par("") %>%                                               

body_add_par(value = "Susan Table", style = "table title") %>%
body_add_par("") %>%
body_bookmark(id = "SusanBMK") %>%
body_add_par("") %>%                               

print(target = "Test Report Skeleton.docx")


read_docx() %>%
body_add_par(value = "Table Title (Corresponding to mtcarsBMK) from Other Document Goes Here", style = "table title") %>%
body_add_par("") %>%
body_add_flextable(flextable(mtcars[1:12, 1:3])) %>%
print(target = "Test Target Table.docx")

Original Question:

I'm using the R officer package to generate Word documents. Imagine a scenario where text initially is synchronized in two word documents. One is a larger report and the other is a table that is generated and then automatically inserted into the report. The title of the table starts out the same in both documents. Now suppose a medical writer manually alters the title of the table in the report. I'd like to be able to detect that and then automatically update the title in the table so it matches what is in the report.

The officer package documentation shows how to replace text within a single document with a user specified text string. It's not clear to me though if it could be used to do what I'm trying to accomplish. Neither is it clear to me that it can't be done within officer.

Below is some code that makes two word documents. One represents a report where changes have been made to a table title. The other represents the original table for which the title needs to be updated to match the report. The difference is minor. There is all caps for a word in one title and not in the other.

My hope is that it will be clear to someone how to detect the change in the first document and then to update the title in the second document.

library(officer)
library(magrittr)

read_docx() %>%
body_add_par(value = "AWESOME Table", style = "table title") %>%
body_add_par("") %>%
body_bookmark(id = "AwesomeBMK") %>%
body_add_par("(Awesome table appears here immediately after AwesomeBMK bookmark)") %>%
print(target = "Awesome Report.docx")

read_docx() %>%
body_add_par(value = "Awesome Table", style = "table title") %>%
body_add_par("") %>%
body_bookmark(id = "AwesomeBMK") %>%
body_add_par("(Awesome table appears here immediately after AwesomeBMK bookmark)") %>%
print(target = "Awesome Table.docx")

Solution

  • Below is what I believe to be a solution. My knowledge of XML is in its infancy. Think this is working though.

    First part of the code makes a Word file. Second part makes accessible the XML underlying that file. Third part reads the relevant part of the XML. Fifth part captures table and figure titles that are immediately followed by a bookmark. Sixth part captures bookmarks that are immediately preceded by a table or figure title. There are a table title and a bookmark in the Word file/XML that are unmatched. The table title is unmatched because there is no bookmark immediately after. The bookmark is unmatched because there is no table or figure title immediately before. The last part links the table/figure title to its corresponding bookmark.

    Had planned to provide the XML here as well. Decided against it though because the XML for any Word document is very verbose and it would have taken forever to format it.

    People who attempt to run the code will not have the Word document template I used containing the Table Title 1 and Figure Title 1 styles. I believe that a suitable Word template can easily be devised though with one's own version of a style for table titles and figure titles.

    Hopefully someday this will prove helpful to someone.

    #### Make Word file ####
    
    library(officer)
    library(magrittr)
    library(xml2)
    
    read_docx("Report Template Blank.docx") %>%            
    
    body_remove() %>%
    body_add_par(value = "Fred Table", style = "Table Title 1") %>%
    body_add_par("") %>%
    body_bookmark(id = "FredtblBMK") %>%
    body_add_par("") %>%
    
    body_add_par(value = "Fred Figure", style = "Figure Title 1") %>%
    body_add_par("") %>%
    body_bookmark(id = "FredfigBMK") %>%
    body_add_par("") %>%
    
    body_add_par(value = "Sally Table", style = "Table Title 1") %>%
    body_add_par("") %>%
    body_bookmark(id = "SallytblBMK") %>%
    body_add_par("") %>%
    
    body_add_par(value = "Sally Figure", style = "Figure Title 1") %>%
    body_add_par("") %>%
    body_bookmark(id = "SallyfigBMK") %>%
    body_add_par("") %>%
    
    body_add_par(value = "Unmatched Table", style = "Table Title 1") %>%
    body_add_par("") %>%
    
    body_add_par("Some text separating the unmatched title and unmatched bookmark.") %>%
    body_add_par("") %>%
    
    body_bookmark(id = "UnmatchedBMK") %>%
    body_add_par("") %>%
    
    print(target = "Test Report Skeleton.docx")
    
    #### Make XML underlying Word document accessible ####
    
    file.copy("Test Report Skeleton.docx", "Test Report Skeleton.zip", overwrite = TRUE)
    unzip("Test Report Skeleton.zip", exdir = "Test Report Skeleton XML")
    
    #### Read XML ####
    
    doc <-  read_xml("./Test Report Skeleton XML/word/document.xml")
    
    #### Find qualifying table and figure titles ####
    
    xml_tbl <-
    xml_find_all(
    doc,
    "//w:p[w:pPr/w:pStyle[@w:val='TableTitle1' or @w:val='FigureTitle1'] and
    ./following-sibling::w:p[1][./w:bookmarkStart]]"
    ) %>%
    xml_text()
    
    #### Find qualifying bookmarks ####
    
    xml_bmk <-
    xml_find_all(
    doc,
    "//w:p[./w:bookmarkStart and
    ./preceding-sibling::w:p[1][./w:pPr/w:pStyle[@w:val='TableTitle1' or @w:val='FigureTitle1']]]
    /w:bookmarkStart"
    ) %>%
    xml_attr("name")
    
    xml_tbl_bmk <- data.frame(title = xml_tbl, bookmark = xml_bmk)
    
    #### Show results ####
    
    xml_tbl_bmk
    
             title    bookmark
    1   Fred Table  FredtblBMK
    2  Fred Figure  FredfigBMK
    3  Sally Table SallytblBMK
    4 Sally Figure SallyfigBMK