To import XML data into a neo4j DB I first parse the XML to a python dictionary and then use CYPHER queries:
WITH $pubmed_dict as pubmed_article
UNWIND pubmed_article as particle
MERGE (p:Publication {pmid: particle.MedlineCitation.PMID.text})
ON CREATE SET p.title = COALESCE (particle.MedlineCitation.Article.Journal.Title, particle.MedlineCitation.Article.ArticleTitle)
ON MATCH SET p.title = COALESCE (particle.MedlineCitation.Article.Journal.Title, particle.MedlineCitation.Article.ArticleTitle)
FOREACH (author IN particle.MedlineCitation.Article.AuthorList.Author |
MERGE (a:Author {last_name: COALESCE(author.LastName, 'LAST NAME MISSING!'), first_name: COALESCE(author.ForeName, 'FIRST NAME MISSING!')})
MERGE (p)<-[:WROTE]-(a)
)
Setting a.affiliation = author.AffiliationInfo.Affiliation works fine, but only if there are not multiple affiliations in the XML under Author like here:
...
<Author ValidYN="Y">
<LastName>Tatarsky</LastName>
<ForeName>Rose L</ForeName>
<Initials>RL</Initials>
<AffiliationInfo>
<Affiliation>Department of Zoology, University of Wisconsin, Madison, WI, 53706, USA.</Affiliation>
</AffiliationInfo>
<AffiliationInfo>
<Affiliation>Department of Neuroscience, University of Wisconsin, Madison, WI, 53706, USA.</Affiliation>
</AffiliationInfo>
</Author>
...
This results in an error:
neo4j.exceptions.CypherTypeError: Type mismatch: expected a map but was List{Map{Affiliation -> String("Department of Zoology, University of Wisconsin, Madison, WI, 53706, USA.")}, Map{Affiliation -> String("Department of Neuroscience, University of Wisconsin, Madison, WI, 53706, USA.")}}
Is there a way to check in the ON CREATE/MATCH SET case if this is a map or a list before assignment?
If a list is recognized, I would like to iterate through it and set properties like affiliation1, affiliation2 and so on, if this is possible.
You can set lists of strings, so in your python code you have to turn those lists of dicts into list of strings.
Why do you use FOREACH here and not another UNWIND ? I don't see where you create the author?
WITH $pubmed_dict as pubmed_article
UNWIND pubmed_article as particle
UNWIND particle.MedlineCitation.Article.AuthorList.Author as author
MERGE (a:Author {last_name: COALESCE(author.LastName, 'LAST NAME MISSING!')})
SET a.first_name = author.ForeName, a.affiliation = author.AffiliationInfo.Affiliation
MERGE (p)<-[:WROTE]-(a)