Search code examples
cachingsearchaembrowser-cache

AEM Lookup Use Case - Design Discussion


Use Case

  1. I have an Excel in DAM which contains 10 columns predefined.
  2. Rows varying from 3000 to 10000. No PI SPI data involved today or in future.
  3. We need to display results on page where end-user input search term matches the excel column A.
  4. It could be more complicated as - Give me those rows where the user input exists in Column A + C + D.
  5. This search that way DOES NOT align to typical "Search" in AEM - optimized Search API, indexing, etc.

Design-1

  1. Have a component for lookup.
  2. Have a resource based servlet bound to component resource.
  3. Read entire excel server-side (servlet) and generate JSON.
  4. Cache the entire JSON on dispatcher.
  5. For every upcoming request, JSOn is served from cache and invalidated on excel data publish.
  6. Let client side have entire JSON and implement the search logic to display relevant result.

Pros - Optimized and efficient - Lower hits on server, entire data (which is not huge) is loaded into In-Memory DOM for quick and intuitive return of search results.

Cons - Search logic which is business logic is implemented client-side. In future if there's an Use Case that do something like JCR lookup with the values matched in excel, then it has to be done for entire dataset to return all such fields in JSON.

How does the In-Memory DOM know that the data has been refreshed in the backend ? On publish, dispatcher will invalidate the JSON response but how will browser DOM know that and update itself ?

Design-2

  1. Have a component for lookup.
  2. Have a resource based servlet bound to component resource.
  3. Read entire excel server-side (servlet), implement search logic and generate results JSON.
  4. On dispatcher JSON gets cached with selector as search-term.
  5. If same request comes with same search term, then only cached result will be delivered, else the request hits the server (AEM Publisher).
  6. Front-end JS gets only the JSON for search result and simply displays it.

Pros - Search logic which is business logic is implemented server-side. More secured. Data transformation logic resides server-side. Not huge volume of data in perspective.

Cons - Less efficient - More hits on server (cache contains json with search term as selector - so the search-term needs to match for the JSON to be returned from cache), page refresh on every search.

Which Design would you prefer ? Or do you have any other design in mind that would be better ?


Solution

  • I would go for option A. Parsing an Excel file is expensive and can add a latency of 1-2 seconds ( No matter which Java Spreadsheet API you are going to use: Apache POI, etc. ).

    10K rows is not that many and can be loaded on the client side.

    How does the In-Memory DOM know that the data has been refreshed in the backend ? On publish, dispatcher will invalidate the JSON response but how will browser DOM know that and update itself ?

    It depends on your implementation. You can fetch the spreadheet JSON every time user searches, or you can fetch it once and cache in DOM. In the latter case the data will be live until refresh.