I am parsing a simple XML document with a simple script that I wrote (with a couple of tweaks). Here is the XML:
<?xml version="1.0" ?>
<library owner="John Franks">
<book>
<title>Sandman Volume 1: Preludes and Nocturnes</title>
<author>Neil Gaiman</author>
</book>
<book>
<title>Good Omens</title>
<author>Neil Gamain</author>
<author>Terry Pratchett</author>
</book>
<book>
<title>The Man And The Goat</title>
<author>Bubber Elderidge</author>
</book>
<book>
<title>Once Upon A Time in LA</title>
<author>Dr Dre</author>
</book>
<book>
<title>There Will Never Be Justice</title>
<author>IR Jury</author>
</book>
<book>
<title>Beginning Python</title>
<author>Peter Norton, et al</author>
</book>
</library>
And here is my Python script:
from xml.dom.minidom import parse
import xml.dom.minidom
import csv
def writeToCSV(myLibrary):
csvfile = open('output.csv', 'w')
fieldnames = ['title', 'author', 'author']
writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
writer.writeheader()
books = myLibrary.getElementsByTagName("book")
for book in books:
titleValue = book.getElementsByTagName("title")[0].childNodes[0].data
for author in book.getElementsByTagName("author"):
authorValue = author.childNodes[0].data
writer.writerow({'title': titleValue, 'author': authorValue})
doc = parse('library.xml')
myLibrary = doc.getElementsByTagName("library")[0]
# Get book elements in Library
books = myLibrary.getElementsByTagName("book")
# Print each book's title
writeToCSV(myLibrary)
Here is my output:
title,author
Sandman Volume 1: Preludes and Nocturnes,Neil Gaiman
Good Omens,Neil Gamain
Good Omens,Terry Pratchett
The Man And The Goat,Bubber Elderidge
Once Upon A Time in LA,Dr Dre
There Will Never Be Justice,IR Jury
Beginning Python,"Peter Norton, et al"
Notice that the book "Good Omens" has 2 authors, and is displayed on two separate lines. What I would really like is for it to display as follows:
title,author,author
Sandman Volume 1: Preludes and Nocturnes,Neil Gaiman,,
Good Omens,Neil Gamain,Terry Pratchett
The Man And The Goat,Bubber Elderidge,,
Once Upon A Time in LA,Dr Dre,,
There Will Never Be Justice,IR Jury,,
Beginning Python,"Peter Norton, et al",,
As you can see there are 3 columns, so the two authors display on the same line. Those books that have only one author, there is simply a blank entry, so two commas next to each other.
Here is one more possible solution:
CODE:
#! /usr/bin/python
from xml.dom.minidom import parse
import xml.dom.minidom
import csv
def writeToCSV(myLibrary):
with open('output.csv', 'wb') as csvfile:
writer = csv.writer(csvfile, delimiter=',',quotechar='"', quoting=csv.QUOTE_MINIMAL)
writer.writerow(['title', 'author', 'author'])
books = myLibrary.getElementsByTagName("book")
for book in books:
titleValue = book.getElementsByTagName("title")[0].childNodes[0].data
authors = [] # get all the authors in a vector
for author in book.getElementsByTagName("author"):
authors.append(author.childNodes[0].data)
writer.writerow([titleValue] + authors) # write to csv
doc = parse('library.xml')
myLibrary = doc.getElementsByTagName("library")[0]
# Print each book's title
writeToCSV(myLibrary)
OUTPUT:
title,author,author
Sandman Volume 1: Preludes and Nocturnes,Neil Gaiman
Good Omens,Neil Gamain,Terry Pratchett
The Man And The Goat,Bubber Elderidge
Once Upon A Time in LA,Dr Dre
There Will Never Be Justice,IR Jury
Beginning Python,"Peter Norton, et al"
Kind Regards,