Search code examples
pythonneo4jcypherpy2neopubmed

neo4j CYPHER - Dynamically run a FOREACH during JSON import depending on whether JSON element has a certain property


To import XML data into a neo4j DB I first parse the XML to a python dictionary and then use CYPHER queries like this: (The pmid has a UNIQUE CONSTRAINT)

WITH $pubmed_dict as pubmed_article
UNWIND pubmed_article as particle
    MERGE (p:Publication {pmid: particle.MedlineCitation.PMID.text})
    ON CREATE SET p.title = COALESCE (particle.MedlineCitation.Article.Journal.Title, particle.MedlineCitation.Article.ArticleTitle)
    ON MATCH SET p.title = COALESCE (particle.MedlineCitation.Article.Journal.Title, particle.MedlineCitation.Article.ArticleTitle)

FOREACH (author IN particle.MedlineCitation.Article.AuthorList.Author |
  MERGE (a:Author {last_name: COALESCE(author.LastName, 'LAST NAME MISSING!'), first_name: COALESCE(author.ForeName, 'FIRST NAME MISSING!')})
  MERGE (p)<-[:WROTE]-(a)      
)

FOREACH (ref IN particle.MedlineCitation.CommentsCorrectionsList.CommentsCorrections |
  MERGE (cited_p:Publication {pmid: COALESCE (ref.PMID.text, 'NO-PMID')}) 
  MERGE (cited_p)<-[:REFERENCES]-(p)   
)

My particle has the following dictionary structure:

particle dictionary structure

What I want to achieve in the second FOREACH loop is: IF there is a particle.MedlineCitation.CommentsCorrectionsList.CommentsCorrections list AND IF it has a map with PMID.text, I want that nothing happens if a publication with the given PMID already exists and I want otherwise that new nodes are created with the given PMID. In both cases I want that the node gets a relationship REFERENCES from the initially created p:Publication at the start of the query.

I have not found the syntax for such a case yet and the only workaround so far using the function COALESCE (ref.PMID.text, 'NO-PMID')}) always creates new :REFERENCES relationships to nodes which have a CommentsCorrectionsList without a PMID.

Who has a better solution?


Solution

  • You can use CASE WHEN THEN to create an (non)-empty collection and use FOREACH on that, which is basically like an conditionally executed block.

    e.g.

    FOREACH (_ IN case when exists(p.foo) then [true] else [] end |
      ....
    )