Sum a numeric properties of leafs counting leafs only once despite multiple paths

I have the following setup:

I have multiple repositories.
A repository has many tags.
A tag points to many blobs.
Different tags can point to the same blob; hence, a blob can be referenced by many tags.
A blob has the property size_in_mb which is a numeric value.

(If this sounds familiar, that is how a Docker registry stores data on disk.)

What do I want to achieve?

I want to have the sum of size_in_mb for each repository, but count each blob only once despite the fact that it can be referenced by many tags.

Let us see following example:

CREATE (b1:Blob {name:"b1", size_in_mb: 1000})
CREATE (b2:Blob {name: "b2", size_in_mb: 100})
CREATE (r1:Repository {name: 'r1'})
CREATE (r2:Repository {name: 'r2'})
CREATE (t1:Tag {name: 'r1:latest'})
CREATE (t2:Tag {name: 'r1:old'})
CREATE (t3:Tag {name: 'r2:latest'})
CREATE (t1)-[:TAG_OF]->(r1)
CREATE (t2)-[:TAG_OF]->(r1)
CREATE (t3)-[:TAG_OF]->(r2)
CREATE (b1)-[:TAGGED_BY]->(t1)
CREATE (b1)-[:TAGGED_BY]->(t2)
CREATE (b1)-[:TAGGED_BY]->(t3)
CREATE (b2)-[:TAGGED_BY]->(t2)

We have

MATCH(r:Repository)<--(t:Tag)<--(b:Blob)
RETURN r,t,b

Graph Visualization

A simple sum

MATCH(r:Repository)<--(t:Tag)<--(b:Blob)
RETURN r.name, sum(b.size_in_mb)

returns

r.name  sum(b.size_in_mb)
"r1"    2100
"r2"    1000

but want to have

r.name  sum(b.size_in_mb)
"r1"    1100
"r2"    1000

because blob b1 and b2 shold be only counted once for repository r1.

How should I phrase my Cypher query to reach that goal?

Solution

I think I got it, based on Michaels answer:

MATCH(r:Repository)<--(t:Tag)<--(b:Blob) 
with r, collect(distinct b) as distinctBlobs 
RETURN r.name,  reduce(totalSum = 0, n IN distinctBlobs | totalSum + n.size_in_mb) as size_sum

Not sure if this is the optimal solution but it does produce the correct values.