Search code examples
jpajoineclipselinkbatch-fetching

@BatchFetch type JOIN


I'm confused about this annotation for an entity field that is of type of another entity:

@BatchFetch(value = BatchFetchType.JOIN)

In the docs of EclipseLink for BatchFetch they explain it as following:

For example, consider an object with an EMPLOYEE and PHONE table in which PHONE has a foreign key to EMPLOYEE. By default, reading a list of employees' addresses by default requires n queries, for each employee's address. With batch fetching, you use one query for all the addresses.

but I'm confused about the meaning of specifying BatchFetchType.JOIN. I mean, doesn't BatchFetch do a join in the moment it retrieves the list of records associated with employee? The records of address/phone type are retrieved using the foreign key, so it is a join itself, right?
The BatchFetch type is an optional parameter, and for join it is said:

JOIN – The original query's selection criteria is joined with the batch query

what does this means? Isn't the batch query a join itself?


Solution

  • Joining the relationship and returning the referenced data with the main data is a fetch join. So a query that brings in 1 Employee that has 5 phones, results in 5 rows being returned, with the data in Employee being duplicated for reach row. When that is less ideal, say a query over 1000 employees, you resort to a separate batch query for these phone numbers. Such a query would run once to return 1000 employee rows, and then run a second query to return all employee phones needed to build the read in employees.

    The three batch query types listed here then determine how this second batch query gets built. These will perform differently based on the data and database tuning.

    • JOIN - Works much the same away a fetch join would, except it only returns the Phone data.
    • EXISTS - This causes the DB to execute the initial query on Employees, but uses the data in an Exists subquery to then fetch the Phones.
    • IN - EclipseLink agregates all the Employee IDs or values used to reference Phones, and uses them to filter Phones directly.

    Best way to find out is always to try it out with SQL logging turned on to see what it generates for your mapping and query. Since these are performance options, you should test them out and record the metrics to determine which works best for your application as its dataset grows.