I am looking at the LDBC benchmark which has contributions from Neo4j and TigerGraph. I want to understand how entries are ingested to measure performance.
Here are two example entries from "Person_likes_Post".
{"creationDate":1296583977045,"deletionDate":1577664000000,"explicitlyDeleted":false,"PersonId":13194139533355,"PostId":412316861128}
{"creationDate":1296750065049,"deletionDate":1296750075058,"explicitlyDeleted":true,"PersonId":13194139533355,"PostId":412316861129}
Does it mean only the edge is deleted when "explicitlyDeleted":true ?
When "explicitlyDeleted":false, does it mean the src node is deleted, dst node is deleted or both?
Link to the benchmark doc:
https://ldbcouncil.org/ldbc_snb_docs/ldbc-snb-specification.pdf
Download link to the example LDBC dataset containing these entries:
https://ldbcouncil.org/ldbc_snb_datagen_spark/social-network-sf0.003-bi-composite-merged-fk.zip
(I wanted to tag LDBC but there is no such an option.)
The explicitlyDeleted
attribute indicates whether there is a delete operation that targets specifically the given entity (i.e. a node or edge in the graph). This distinction is needed because the LDBC SNB workloads have cascading deletes where the deletion of an entity may trigger the deletion of other entities.
For example, a Person_likes_Post
edge can be deleted due to various explicit delete operations:
Person_likes_Post
edgePerson
Post
Forum
that contains its target Post
Person
whose Album
/Wall
(which are Forum
subtypes) contains its target Post
For the Person_likes_Post
edge, the explicitlyDeleted
attribute is true in case 1, and false for the other cases.
Note that this attribute is only part of the raw data set. The data sets used for the actual workload executions (Interactive, BI) only contain explicit delete operations, hence they omit this attribute.