Search code examples
neo4jtransfer

graphing dorm movements in neo4j


I'm really struggling getting my head around neo4j and was hoping someone might be able to help point me in the right direction with the below.

Basically, I have a list of what can be referred to as events; the event can be said to describe a patient entering and leaving a room.

Each event has a unique identifier; it also has an identifier for the student in question along with start and end times (e.g. the student entered the room at 12:00 and left at 12:05) and an identifier for the room.

The event and data might look along the lines of the below, columns separated by a pipe delimiter

ID|SID|ROOM|ENTERS|LEAVES
1|1|BLUE|1/01/2015 11:00|4/01/2015 10:19
2|2|GREEN|1/01/2015 12:11|1/01/2015 12:11
3|2|YELLOW|1/01/2015 12:11|1/01/2015 12:20
4|2|BLUE|1/01/2015 12:20|5/01/2015 10:48
5|3|GREEN|1/01/2015 18:41|1/01/2015 18:41
6|3|YELLOW|1/01/2015 18:41|1/01/2015 21:00
7|3|BLUE|1/01/2015 21:00|9/01/2015 9:30
8|4|BLUE|1/01/2015 19:30|3/01/2015 11:00
9|5|GREEN|2/01/2015 19:08|2/01/2015 19:08
10|5|ORANGE|2/01/2015 19:08|3/01/2015 2:43
11|5|PURPLE|3/01/2015 2:43|4/01/2015 16:44
12|6|GREEN|3/01/2015 11:52|3/01/2015 11:52
13|6|YELLOW|3/01/2015 11:52|3/01/2015 17:45
14|6|RED|3/01/2015 17:45|7/01/2015 10:00

Questions that might be asked could be:

what rooms have student x visited and in what order

what does the movement of students between rooms look like - to which room does students go to when they leave room y

That sounds simple enough but I'm tying myself into knots.

I started off creating unique constraints for both student and room

create constraint on (student: Student) assert student.id is unique

I then did the same for room.

I then loaded student as

using periodic commit 1000 load csv with headers from 'file://c:/event.csv' as line merge (s:Student {id: line.SID});

I also did the same for room and visits.

I have absolutely no idea how to create the relationships though to be able to answer the above questions though. Each event lists the time the student enters and leaves the room but not the room the student went to. Starting with the extract, should the extract be changed so that it contains the room the student left for? If someone could help talk through how I need to think of the relationships that needs to be created, that would be very much appreciated.

Cheers


Solution

  • As the popular saying goes, there is more than one way to skin an Ouphe - or thwart a mage. One way you could do it (which makes for the simplest modeling imo) is as follows :

    CREATE CONSTRAINT ON (s:Student) ASSERT s.studentID IS UNIQUE;
    CREATE CONSTRAINT ON (r:Room) ASSERT r.roomID IS UNIQUE;
    
    USING PERIODIC COMMIT
    LOAD CSV WITH HEADERS FROM "file:///dorm.csv" as line fieldterminator '|'
    MERGE (s:Student {studentID: line.SID})
    MERGE (r:Room {roomID: line.ROOM})
    CREATE (s)-[:VISIT {starttime: apoc.date.parse(line.ENTERS,'s',"dd/MM/yyyy HH:mm"), endtime: apoc.date.parse(line.LEAVES,'s',"dd/MM/yyyy HH:mm")}]->(r);
    
    # What rooms has student x visited and in what order
    MATCH (s:Student {studentID: "2"})-[v:VISIT]->(r:Room)
    RETURN r.roomID,v.starttime ORDER BY v.starttime;
    
    # What does the movement of students between rooms look like - to which room does students go to when they leave room y
    MATCH (s:Student)-[v:VISIT]->(r:Room {roomID: "GREEN"})
    WITH s, v
    MATCH (s)-[v2:VISIT]->(r2:Room)
    WHERE v2.starttime > v.endtime
    RETURN s.studentID, r2.roomID, v2.starttime ORDER BY s.studentID, v2.starttime;
    

    So actually you would only have Student and Room as nodes and a student VISITing a room would make up the relationship (with the enter/leave times as properties of that relationship). Now, as you can see that might not be ideal for your second query (although it does work). One alternative is to have a Visit node and chain it (as timeline events) to both Students and Rooms. There's plenty of examples around on how to do that.

    Hope this helps, Tom