Search code examples
pythonxmlpython-2.7csvminidom

Parsing XML and writing to CSV file


I am parsing a simple XML document with a simple script that I wrote (with a couple of tweaks). Here is the XML:

<?xml version="1.0" ?>
<library owner="John Franks">
 <book>
  <title>Sandman Volume 1: Preludes and Nocturnes</title>
  <author>Neil Gaiman</author>
 </book>
 <book>
  <title>Good Omens</title>
  <author>Neil Gamain</author>
  <author>Terry Pratchett</author>
 </book>
 <book>
  <title>The Man And The Goat</title>
  <author>Bubber Elderidge</author>
 </book>
 <book>
  <title>Once Upon A Time in LA</title>
  <author>Dr Dre</author>
 </book>
 <book>
  <title>There Will Never Be Justice</title>
  <author>IR Jury</author>
 </book>
 <book>
  <title>Beginning Python</title>
  <author>Peter Norton, et al</author>
 </book>
</library>

And here is my Python script:

from xml.dom.minidom import parse
import xml.dom.minidom
import csv

def writeToCSV(myLibrary):
  csvfile = open('output.csv', 'w')
  fieldnames = ['title', 'author', 'author']
  writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
  writer.writeheader()
  
  books = myLibrary.getElementsByTagName("book")
  for book in books:
    titleValue = book.getElementsByTagName("title")[0].childNodes[0].data
    for author in book.getElementsByTagName("author"):
      authorValue = author.childNodes[0].data
      writer.writerow({'title': titleValue, 'author': authorValue})


doc = parse('library.xml')
myLibrary = doc.getElementsByTagName("library")[0]

# Get book elements in Library
books = myLibrary.getElementsByTagName("book")

# Print each book's title
writeToCSV(myLibrary)

Here is my output:

title,author

Sandman Volume 1: Preludes and Nocturnes,Neil Gaiman

Good Omens,Neil Gamain

Good Omens,Terry Pratchett

The Man And The Goat,Bubber Elderidge

Once Upon A Time in LA,Dr Dre

There Will Never Be Justice,IR Jury

Beginning Python,"Peter Norton, et al"

Notice that the book "Good Omens" has 2 authors, and is displayed on two separate lines. What I would really like is for it to display as follows:

title,author,author

Sandman Volume 1: Preludes and Nocturnes,Neil Gaiman,,

Good Omens,Neil Gamain,Terry Pratchett

The Man And The Goat,Bubber Elderidge,,

Once Upon A Time in LA,Dr Dre,,

There Will Never Be Justice,IR Jury,,

Beginning Python,"Peter Norton, et al",,

As you can see there are 3 columns, so the two authors display on the same line. Those books that have only one author, there is simply a blank entry, so two commas next to each other.


Solution

  • Here is one more possible solution:

    CODE:

    #! /usr/bin/python
    
    from xml.dom.minidom import parse
    import xml.dom.minidom
    import csv
    
    def writeToCSV(myLibrary):
        with open('output.csv', 'wb') as csvfile:
            writer = csv.writer(csvfile, delimiter=',',quotechar='"', quoting=csv.QUOTE_MINIMAL)
            writer.writerow(['title', 'author', 'author'])
            books = myLibrary.getElementsByTagName("book")
            for book in books:
                titleValue = book.getElementsByTagName("title")[0].childNodes[0].data
                authors = [] # get all the authors in a vector
                for author in book.getElementsByTagName("author"):
                    authors.append(author.childNodes[0].data)
                writer.writerow([titleValue] + authors) # write to csv
    
    doc = parse('library.xml')
    myLibrary = doc.getElementsByTagName("library")[0]
    # Print each book's title
    writeToCSV(myLibrary)
    

    OUTPUT:

    title,author,author
    Sandman Volume 1: Preludes and Nocturnes,Neil Gaiman
    Good Omens,Neil Gamain,Terry Pratchett
    The Man And The Goat,Bubber Elderidge
    Once Upon A Time in LA,Dr Dre
    There Will Never Be Justice,IR Jury
    Beginning Python,"Peter Norton, et al"
    

    Kind Regards,