Search code examples
neo4jcypher

calculation of percentage from a column is table neo4j/Cypher


I want to calculate percentage of how much each type of skill is present for each occupation. I made a have different nodes occupation and Soft_skills they have an relationship [r:MAPS_TO] the relationship has two property type('essentieel', 'optioneel')and b (this kind of weight essentieel =1.0 and optioneel =0.5)

see below een example of the file that i'm using

occupation code_occupation skill_code name type b
dokter sk_565 sk_687 leren optioneel 0.5
dokter sk_565 sk_687 leren optioneel 0.5
dokter sk_565 sk_687 leren essentieel 1.0
dokter sk_565 sk_687 leren essentieel 1.0

i made a query where i can calculate what the sum is for each occupation see below my query:

match (b:BOC_beroep)-[r:MAPS_TO]-(s:Soft_Skill)                                                     
return b.beroepstitel as beroep,                             
sum(toFloat(r.b)) as sum order by sum desc                         
limit 10

this is my output

beroep sum
Doker 5
Kapper 7

know i want to achieve what the percentage is of the type of skill my calculation will be

Dokter have in total 4 skills en the percentage of each skill is 
2/4 * 100 = 50% essentieel 
1/4 * 100 = 25% optioneel 

i was trying to make a new calculation with this query:

  MATCH (b:BOC_beroep)-[r:MAPS_TO]->(s:Soft_skill)
  WITH SUM(toFloat(r.b)) As total
  MATCH (b:BOC_beroep)-[r:MAPS_TO]->(s:Soft_Skill)
  RETURN b.beroepstitel  AS beroep, 
  (toFloat(total/(r.b)))*100 AS percent
  order by percent desc

My output is Cannot divide 'Long' by 'string' I don't understand the erorr

i was inspired by this question and answer Here's an inline link to stackoverflow.


Solution

  • Your MAPS_TO relationships apparently store the b property as a string, so any query must always convert it to a float to do math with it. Also, the math in your query is wrong - r.b should be in the numerator and total should be in the denominator.

    This query may work for you:

    MATCH (b:BOC_beroep)-[r:MAPS_TO]->(s:Soft_Skill)
    WITH b.beroepstitel AS beorep, r.type AS type, s.name AS skill, SUM(TOFLOAT(r.b)) AS weight
    WITH SUM(weight) As total, COLLECT([beorep, type, skill, weight]) AS data
    UNWIND data AS datum
    RETURN
      datum[0] AS beroep,
      datum[1] AS type,
      datum[2] AS skill,
      datum[3]/total*100 AS percent
    ORDER BY percent desc
    

    The first WITH clause sums all the weights for a given beorep, type, and skill. The second WITH clause calculates a total of all the weights and collects all the other data needed. The rest of the query is straightforward.

    Here is a sample result:

    ╒════════╤════════════╤═══════╤═════════════════╕
    │beroep  │type        │skill  │percent          │
    ╞════════╪════════════╪═══════╪═════════════════╡
    │"dokter"│"essentieel"│"leren"│66.66666666666666│
    ├────────┼────────────┼───────┼─────────────────┤
    │"dokter"│"optioneel" │"leren"│33.33333333333333│
    └────────┴────────────┴───────┴─────────────────┘
    

    Recommendations

    1. If you stored the b property as a float instead of a string, you can avoid the cost of doing TOFLOAT conversions all the time. You can easily convert all the stored weights this way:

      MATCH ()-[r:MAPS_TO]->()
      SET r.b = TOFLOAT(r.b)
      
    2. If type only has 2 possible values, it would require less storage space to store that property as a boolean (named, say, isRequired) instead. This can also simplify queries using that property. You can easily convert that property this way:

      MATCH ()-[r:MAPS_TO]->()
      SET r.isRequired = CASE r.type WHEN 'essentieel' THEN true ELSE false END
      REMOVE r.type