Search code examples
rxmlxml2

Import xml to R


I have election results data in xml files I am trying to import into R. This is my first time ever working with xml files but I haven't the foggiest idea what is up with the .xls version of the data I can download so I'm attempting to work with the xml.

There isn't a direct link to the xml file, but it can be accessed here https://results.enr.clarityelections.com/IL/Bloomington/109017/web.276013/#/summary on the right side by scrolling down a bit to "Reports" and downloading "Detail XML".

I've been trying to use xml2 to get it into a data frame. I can read_xml then turn it into a list but after that my attempts have given me only a variety of errors or more lists with a lot of NULLs. It's possible the weirdness is being caused by the xml file itself, but I don't know enough about them to know if that is the case.


Solution

  • Here's the solution I ended up with: use XSLT to restructure the xml before trying to construct a data frame. Basics of the solution came from R: convert XML data to data frame (coincidently also about election data).

    XSLT - Restructured it to just be one long list of every precinct node with the applicable info from their choice, contest, and votetype ancestors as attributes.

    <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
      <xsl:output method="xml" omit-xml-declaration="yes" indent="yes"/>
      <xsl:strip-space elements="*"/>
    
      <xsl:template match="@* | node()">
        <xsl:copy>
          <xsl:apply-templates select="@* | node()"/>
        </xsl:copy>
      </xsl:template>
    
      <xsl:template match="/ElectionResult">
        <xsl:copy>
          <xsl:apply-templates select="descendant::Precinct"/>
        </xsl:copy>
      </xsl:template>
    
      <xsl:template match="Precinct">
        <xsl:copy>
          <xsl:apply-templates select="@*"/>
          <xsl:attribute name="election">
            <xsl:value-of select="ancestor::ElectionResult/ElectionName"/>
          </xsl:attribute>
          <xsl:attribute name="contest">
            <xsl:value-of select="ancestor::Contest/@text"/>
          </xsl:attribute>
          <xsl:attribute name="choice">
            <xsl:value-of select="ancestor::Choice/@text"/>
          </xsl:attribute>
          <xsl:attribute name="votetype">
            <xsl:value-of select="ancestor::VoteType[1]/@name"/>
          </xsl:attribute>
        </xsl:copy>
      </xsl:template>
    
    </xsl:stylesheet>
    

    R - The xslt package works as an extension for xml2 to apply the .xsl file.

    library(xml2)
    library(xslt)
    library(tidyverse)
    
    # Parse XML and XSL
    xml <- read_xml("electionresults.xml")
    style <- read_xml("style.xsl", package = "xslt")
    
    # Transform XML
    new_xml <- xslt::xml_xslt(xml, style)
    
    # Build data frame
    elections <- new_xml %>% 
      xml_find_all("//Precinct") %>% 
      map_dfr(~list(election = xml_attr(., "election"),
                    contest = xml_attr(., "contest"),
                    choice = xml_attr(., "choice"),
                    votetype = xml_attr(., "votetype"),
                    precinct = xml_attr(., "name"),
                    votes = xml_attr(., "votes"))) %>% 
      type_convert()
    

    Mapping process for building the data frame came from R XML - combining parent and child nodes into data frame