Search code examples
orientdb

OrientDB: Getting edges between subqueries of vertices


Please advise on the following:

  1. I need to get only edges between one vertex (artist) and a set of vertices (articles sorted by creation date).

What I'm doing now: A. To get, say, Madonna's record:

SELECT FROM Artist WHERE title="Madonna" LIMIT 1

B. To get 2 latest articles about her:

SELECT expand(in("AboutArtist")[content_type="article"]) FROM Artist WHERE title="Madonna" ORDER BY created DESC LIMIT 2

I use "content_type" because I have 2 node types pointing to artists: Article and Gallery. And I need articles only.

Expected result: array of edges between the artist and articles about her (2 articles sorted by creation date).

  1. Also, for the same use-case, please advise on the correct traverse to get all together: Madonna's record AND her 2 articles (as per the query above) AND the edges between them

Thank you! D.


Solution

  • I will clear up a few misconceptions you seem to have, before giving an answer.

    • I take it you are using regular edges, as opposed to lightweight edges? You can't return a lightweight edge in a query, as it doesn't have a physical record. So if you actually need the edge records, because they have some fields themselves, make sure you configure the database appropriately.
    • When you do expand(in('SomeEdge')), it will follow the link to the next record. In the case of lightweight edges, this will be a vertex, which will be returned. In the case of an edge record, it will follow the out link to the next record (probably a vertex, unless you link edges to edges), and then return that record. Thus if you want to get the edge records, you need to use inE(), rather than in().
    • Further to the previous point, when you have an edge record (ie you expanded a field using inE()), then you can get to the next record via the field/link called out, or the outV() function (presuming there is only 1 link out of the edge, it will return that 1 record). You will see the out field used in my answers.
    • [content_type="article"] is checking the field called 'content_type' on the record, which causes the record to be loaded (and possibly discarded from the results, thus a 'wasted' load). Instead, if you have specific classes for Article and Gallery, you should use these to filter with [@class=Article]. I believe OrientDB can infer the class based upon the cluster id that is part of the rid, thus it doesn't have to load the record to filter it.
    • Further to the previous point, rather than have 1 edge that links galleries and articles (currently you have "AboutArtist"), why not make a specific edge for each, ie 'hasArticle' and 'hasGallery'? This way you don't need to filter the records at all.

    So here is the query to get the 2 latest article edges.

    SELECT expand(inE('AboutArtist')[out.@class='Article'])
    FROM Artist
    WHERE title="Madonna"
    ORDER BY out.created DESC
    LIMIT 2
    

    That query won't function as expected if you have 2 or more artists with the title Madonna, as it will get the all the edges to all of the Madonna's, and return the latest 2, which may not be for the Madonna you actually wanted. Thus, I would 'nest' the query that finds Madonna, and limit it to 1 (presuming it is the correct one). It seems the order and limit doesn't function the same when you do this, so you need to nest twice.

    SELECT
    FROM (
        SELECT expand(inE('AboutArtist')[out.@class='Article'])
        FROM (
            SELECT
            FROM Artist
            WHERE title="Madonna"
            LIMIT 1
        )
    )
    ORDER BY out.created DESC
    LIMIT 2
    

    So the inner most query gets Madonna, the 'middle' query gets the edges from Madonna, and the outer query sorts and limits these edges.

    Your last question is a bit harder. Traversing from the 1 Madonna, and returning all the edges and linked vertex is quite easy;

    TRAVERSE inE('AboutArtist'), out
    FROM (
        SELECT FROM Artist WHERE title="Madonna" LIMIT 1
    )
    

    This returns all AboutArtist edges, and all Articles and Galleries. Limiting this to the 2 edges we found above is proving a bit more difficult though. I couldn't find an elegant solution, but I found one none the less.

    TRAVERSE in, out
    FROM (
        SELECT
        FROM (
            SELECT expand(inE('AboutArtist')[out.@class='Article'])
            FROM (
                SELECT FROM Artist WHERE title="Madonna" LIMIT 1
            )
        )
        ORDER BY out.created DESC
        LIMIT 2
    )
    

    All I have done is nested the query above that finds the 2 edges, and then traversed the in and out links of the edges, thus returning the artist and 2 articles. The order of the results isn't the same as when traversing artist->edge->article, but I doubt the order matters.