Search code examples
gremlintinkerpopamazon-neptune

Fetch substring of a field from the edge of a graph


I am trying fetch substring of field that is stored as an attribute in the edge of graph. To be more specific on the company-company edges

I have attached the graph creation query.

g.addV('company').property(id,'SHRTST01').property('name','Alphabet').next()
g.addV('company').property(id,'SHRTST02').property('name','Google').next()
g.addV('company').property(id,'SHRTST03').property('name','Youtube').next()
g.addV('company').property(id,'SHRTST05').property('name','YoutubeKids').next()
g.addV('person').property(id,'SHRTST01_1900-01-01_1_1').property('bu_id', 'SHRTST01_1900-01-01_1_1').property('name', 'W Karl David Laxton').next()
g.addV('person').property(id,'SHRTST02_1900-01-01_1_1').property('bu_id', 'SHRTST02_1900-01-01_1_1').property('name', 'Steven H Strong').next()


g.addE('HAS_SHRHLDING_PC_TO').from(__.V('SHRTST01')).to(__.V('SHRTST01_1900-01-01_1_1')).property(id,'SHRTST01_HAS_SHRHLDING_PC_TO_SHRTST01_1900-01-01_1_1').property('perc_value', 30).next()
g.addE('HAS_VOTING_PC_TO').from(__.V('SHRTST01')).to(__.V('SHRTST01_1900-01-01_1_1')).property(id,'SHRTST01_HAS_VOTING_PC_TO_SHRTST01_1900-01-01_1_1').property('perc_value', 50).next()
g.addE('HAS_SHRHLDING_PC_TO').from(__.V('SHRTST01')).to(__.V('SHRTST02')).property(id,'SHRTST01_HAS_SHRHLDING_PC_TO_SHRTST02_2002-01-01_2_2').property('perc_value', 75).next()
g.addE('HAS_VOTING_PC_TO').from(__.V('SHRTST01')).to(__.V('SHRTST02')).property(id,'SHRTST01_HAS_VOTING_PC_TO_SHRTST02_2002-01-01_2_2').property('perc_value', 50).next()
g.addE('HAS_SHRHLDING_PC_TO').from(__.V('SHRTST02')).to(__.V('SHRTST02_1900-01-01_1_1')).property(id,'SHRTST02_HAS_SHRHLDING_PC_TO_SHRTST02_1900-01-01_1_1').property('perc_value', 25).next()
g.addE('HAS_VOTING_PC_TO').from(__.V('SHRTST02')).to(__.V('SHRTST02_1900-01-01_1_1')).property(id,'SHRTST02_HAS_VOTING_PC_TO_SHRTST02_1900-01-01_1_1').property('perc_value', 23).next()
g.addE('HAS_SHRHLDING_PC_TO').from(__.V('SHRTST02')).to(__.V('SHRTST03')).property(id,'SHRTST02_HAS_SHRHLDING_PC_TO_SHRTST03_2002-01-01_2_2').property('perc_value', 80).next()
g.addE('HAS_VOTING_PC_TO').from(__.V('SHRTST02')).to(__.V('SHRTST03')).property(id,'SHRTST03_HAS_VOTING_PC_TO_SHRTST02_2003-01-01_2_2').property('perc_value', 20).next()
g.addE('HAS_SHRHLDING_PC_TO').from(__.V('SHRTST03')).to(__.V('SHRTST05')).property(id,'SHRTST03_HAS_SHRHLDING_PC_TO_SHRTST05_2002-01-01_2_2').property('perc_value', 75).next()
g.addE('HAS_VOTING_PC_TO').from(__.V('SHRTST03')).to(__.V('SHRTST05')).property(id,'SHRTST03_HAS_VOTING_PC_TO_SHRTST05_2002-01-01_2_2').property('perc_value', 30).next()

The below query gives me the following output but I need to refine it bit more to get the expected output.

g.V('SHRTST01').as('crn')
 .repeat(
  outE('HAS_SHRHLDING_PC_TO').as('edge_field')
 .inV()
 .simplePath())
 .until(not(outE()))
 .emit()
 .hasLabel('company')
 .select('crn','edge_field')
 .project('crn', 'edge_field')
 .by(select(keys).select('crn'))
 .by(select(keys).select('edge_field'))

Actual output:

{'crn': v[SHRTST01], 'edge_field': e[SHRTST01_HAS_SHRHLDING_PC_TO_SHRTST02_2002-01-01_2_2][SHRTST01-HAS_SHRHLDING_PC_TO->SHRTST02]}
{'crn': v[SHRTST01], 'edge_field': e[SHRTST02_HAS_SHRHLDING_PC_TO_SHRTST03_2002-01-01_2_2][SHRTST02-HAS_SHRHLDING_PC_TO->SHRTST03]}
{'crn': v[SHRTST01], 'edge_field': e[SHRTST03_HAS_SHRHLDING_PC_TO_SHRTST05_2002-01-01_2_2][SHRTST03-HAS_SHRHLDING_PC_TO->SHRTST05]}

Expected output:

{'crn': [SHRTST01], 'edge_field': [SHRTST01_HAS_SHRHLDING_PC_TO_SHRTST02_2002-01-01_2_2], 'shr_id':[SHRTST02_2002-01-01_2_2]}
{'crn': [SHRTST01], 'edge_field': [SHRTST02_HAS_SHRHLDING_PC_TO_SHRTST03_2002-01-01_2_2], 'shr_id':[SHRTST03_2002-01-01_2_2]}
{'crn': [SHRTST01], 'edge_field': [SHRTST03_HAS_SHRHLDING_PC_TO_SHRTST05_2002-01-01_2_2], 'shr_id':[SHRTST05_2002-01-01_2_2]}

I am not sure, how can I derieve the shr_id. As you can see, the shr_id is substring of edge_field. Any leads on this would be very helpful.

I am also interested to know if there is better way to handle the use case.

Thanks.


Solution

  • The very latest release of Apache TinkerPop (3.7.1) added many new string and list operations to the Gremlin language. Once Amazon Neptune has moved up to that level of Gremlin you will easily be able to perform substring like operations. Until then it's probably easiest to do this in the application.

    This has been a gap in Gremlin for a long time (many historical reasons) but it is great to see it now being addressed.

    Those new steps are discussed here: https://github.com/apache/tinkerpop/blob/3.7.1/CHANGELOG.asciidoc#release-3-7-1