.net database postgresql entity-framework database-design

Integer PK but a GUID as FK

In our current PostgreSQL DB, primary keys are Ints. However, in almost all of the communications between services another candidate key is used, a GUID. While I am not the original author of the codebase, I believe the reason for this setup is favoring GUIDs as the ID to be exposed in DTOs.

The foreign keys references the int PK, and here the issue unfolds. When returning the DTO in an API, I need to do a join in order to get the GUID based on the integer FK. This is very frequent across APIs, so I believe it's a performance concern.

Using the GUID as the foreign key instead of the PK will avoid the need to perform a join, but I need your thoughts on this.

What is the recommended approach when the PK is an integer but APIs require GUIDs, and why?

EDIT

Here's a dotnet example fetching a page of OrderProcessingResult records, eagerly loading associated Warehouse and ProcessedByEmployee entities to enable mapping of their respective UUIDs.

An OrderProcessingResultDto will have the processing details, plus WarehouseId and EmployeeId, which are UUIDs, unlike the database INT FKs.

public async Task<OrderProcessingResults> FetchOrderProcessingRecords(PaginationRequest request) {
        var query = dbContext
            .OrderProcessing.Include(o => o.Warehouse)
            .Include(o => o.ProcessedByEmployee);
        var totalCount = await query.CountAsync();
        var pagedResults = await query
            .Skip(request.SkipRecordCount)
            .Take(request.Limit)
            .ToListAsync();
        return new OrderProcessingResults
        {
            PageNumber = request.PageNumber,
            PageSize = request.Limit,
            TotalRecords = totalCount,
            PageRecords = pagedResults.Count,
            Data = mapper.Map<IEnumerable<OrderProcessingResultDto>>(pagedResults),
        };
 }

Solution

There are a few adjustments you can make to the query process to improve performance: Firstly, when using a mapping library like Automapper or Mapperly/Mapify, these typically support a projection method that ties directly into EF's IQueryable. This negates the cost of a double-projection, where your first query materializes a page of entire entities and their relations where the second map call cherry-picks the required columns. So for instance if there is also a .Where() condition:

IQueryable<OrderProcessing> query = dbContext.OrderProcessing
    .Where(op => /* filter conditions /*);
int totalCount = await query.CountAsync();
var pagedResults = await query
    .ProjectTo<OrderProcessingResultDto>(mapper.ConfigurationProvider) //Not 100% sure on this for passing the MapperConfiguration, I normally have a mapperconfig factory method to build this.
    .Skip(request.SkipRecordCount)
    .Take(request.Limit)
    .ToListAsync();

return new OrderProcessingResults
{
    PageNumber = request.PageNumber,
    PageSize = request.Limit,
    TotalRecords = totalCount,
    PageRecords = pagedResults.Count,
    Data = pagedResults
};

The difference here is the query /w .ProjectTo() results in an SQL query that only includes the columns from the OrderProcessing and related tables that are needed to populate the DTO rather than returning the entire rows of each table.

Note that this still results in two queries, one to get the count, and a second to get the page of data. Pagination queries like this need to be careful when run against very large sets with potentially complex/expensive filtering since getting an accurate count can still end up doing entire table scans in cases.

If the filter conditions were something like:

.Where(op => op.Customer.CustomerUUID == customerUUID);

... then we would benefit from an index on the OrderProcessing table for the CustomerUUID. This would assist with identifying the actual CustomerId values to find any joined child rows, though in your example it looks like we would use a WarehouseId and ProcessedByEmployeeId on the OrderProcessing row to find the related data. Making CustomerUUID the PK is an option, but GUIDs generally make for poor PKs /s clustered indexes due to how UUIDs will result in considerable page fragmentation. DBMS' may support a more sortable UUID generation which is more paging friendly, but that defeats the goal of non-guessable IDs for APIs.

When working with projections it can be useful to test with a profiler that can make indexing suggestions. I'm not sure what is available for PostgreSQL, but SQL Server /w SSMS comes with a Profiler to capture the queries EF executes, where you can run those in SSMS with an execution plan that can recommend indexes. These do need to be taken with a grain of salt as they can lead to excessive index column inclusions that bloat index sizes for no real performance improvement.