Search code examples
c#linqentity-framework-coreabp-frameworkef-core-6.0

How to replicate SQL's ORDER BY NEWID() in ABP framework?


How can I replicate the behavior of this SQL query in ABP framework? The SQL query is:

SELECT * FROM App_Profile
ORDER BY NEWID()

I tried using the following LINQ expression but it did not work - I want to get random records each time, but I got always the same results instead:

query.Skip(input.SkipCount).Take(input.MaxResultCount).OrderBy(x => Guid.NewGuid());
var profiles= await _asyncExecuter.ToListAsync(query);
  • Concrete example: I have a dataset of 100 questions, and an endpoint that returns 30 questions, each with a collection of potential responses. My goal is to ensure that each time the user invokes the endpoint, it returns a different set of questions with their potential responses, regardless of whether any questions happen to exist in both the first and second results.

P.S : I'm using ABP framework, EF Core 6

  • After trying I got to this solution It seems the problem lies with await _asyncExecuter.ToListAsync(query) and I'm not sure why. I've tried using var questions = query.OrderBy(x => Guid.NewGuid()).Take(input.MaxResultCount).ToList(); instead.

Solution

  • Tatranskymedved has answered your question, but to expand on the elements you appear to be confused about:

    Use the following to only load the selected items, not everything into memory first:

    var profiles = await query.OrderBy(x => Guid.NewGuid())
        .Skip(input.SkipCount)
        .Take(input.MaxResultCount)
        .ToListAsync();
    

    Now the real question is what are you trying to accomplish here? Randomly ordering the entire set of rows then using skip and take as a DB query doesn't make any sense. Sure, this will take a page of data from a random set, but if your goal is to load pages of data that has been randomized, this won't work as you probably expect as each query will re-randomize the set so you can, and will get back the same item in multiple paged calls. The ordering needs to be persisted between paging calls. If you just want 100 random items out of an entire set, you don't need Skip, just use Take to get the first 100 randomized items.

    I don't know what _asyncExecutor is, but I'm pretty certain it isn't necessary unless it is a wrapper to log output or such, but I'd suspect it is meant to do something like wrap a synchronous operation to be treated as an asynchronous one. (Not needed as EF supports async operations)

    To explain what you are seeing with your examples:

    query.Skip(input.SkipCount)
        .Take(input.MaxResultCount)
        .OrderBy(x => Guid.NewGuid());
    
    var profiles= await query.ToListAsync();
    

    Given a Skip value of 100 and a MaxResultCount of 10 then this would always take rows 101-110 then randomly order those 10 results rather than randomly order the whole set. The additional trouble with this approach is that the assumed default order that the data will be read in is not reliable and will vary as data is added/removed from the set. It will start by looking like the default order will be the ID or order the rows were added, but as the set grows and changes, that will not be repeatably reliable.

    Concrete example: I have a dataset of 100 questions and an endpoint that returns 30 questions. What I want to do is to ensure that each time the user invokes the endpoint, it will return a different set of questions, regardless of whether there are any questions that happen to exist in both the first and second results.

    If you want to select 30 random questions from a data set of 100 questions and you don't care if questions can be repeated between calls:

    var profiles = await query.OrderBy(x => Guid.NewGuid())
        .Take(input.MaxResultCount)
        .ToListAsync();
    

    This is all you need.

    If you want to ensure that the next 30 questions cannot include questions the user has already tried, then the best way to ensure this is to cache the question IDs that you have already selected, and exclude that from the set:

    Initial state:

    List<int> questionIdsAsked = (List<int>)Session[nameof(questionIdsAsked)] ?? new List<int>();
    
    if(questionIdsAsked.Any())
        query = query.Where(x => !questionIdsAsked.Contains(x.Id));
    
    var questions = await query.OrderBy(x => Guid.NewGuid())
        .Take(input.MaxResultCount)
        .ToListAsync();
    
    questionIdsAsked.AddRange(questions.Select(x => x.Id));
    Session[nameof(questionIdsAsked)] = questionIdsAsked;
    

    This assumes a web application, but if an application the questionIdsAsked can just be a private member that can be cleared if necessary. This checks to see if the current user has been given a set of questions. On the first run for the session we get the first 30 questions from the database and record those question IDs to the session state. This way when we call it again we get the 30 question IDs from the previous run and exclude those from the query before re-randomizing and taking 30 new questions. Obviously you will need to handle the scenario where you might run out of questions or enough for getting a full set of 30 if you take this approach.

    Edit: Note that if you are eager loading responses or other data, you cannot use AsSplitQuery with this method. This would result in a random set of Responses to questions being read so some questions might get their answers loaded while others do not. With Split Queries EF would generate two queries which would look something like:

    SELECT TOP 10 * FROM Questions
    ORDER BY NEWID()
    
    SELECT r.* FROM Responses r
    INNER JOIN Questions q ON r.QuestionId = q.Id
    WHERE q.Id IN (SELECT TOP 10 Id FROM Questions 
    ORDER BY NEWID())
    

    The first query would fetch 10 random questions, but for split queries to work reliably the sorting has to be identical, which in this case it is not. It will load responses for a different 10 random questions.

    You either need to accept the cost of the Cartesian Product of the questions and their responses, or you could consider performing the Split query manually:

    // Note: No eager load.
    var questions = await _context.Questions
        .OrderBy(x => Guid.NewGuid())
        .Skip(input.SkipCount)
        .Take(input.MaxResultCount)
        .ToListAsync();
    var questionIds = questions.Select(x => x.Id).ToList();
    var responses = await _context.Responses
        .Where(x => questionIds.Contains(x.QuestionId))
        .ToListAsync();
    

    Better would be to project the question and the response details you need into a DTO to reduce the query Cartesian as much as possible so that you don't have to resort to Include and AsSplitQuery.