Search code examples
entity-frameworkelasticsearchdata-modelingkibana-4

How to data model a live web app from SQL Server to ElasticSearch?


In our web application we use a denormalized data mart in SQL Server for geo-based user project content.

Users have 1..*projects, 1..*geo areas. Content is stored (in the data mart) with UserID, ProjectID, text values for geo areas, title and description (both free text search indexed):

UserID, ProjectID, Geo, Title, Description, Timestamp

Now wanting to move this over to ElasticSearch, what would be a good data modeling approach?

Simply for the data mart, I was thinking of just serializing the data object (currently using .Net and EntityFramework) to give me the JSON representation and stuffing that into ES. Is this a good approach (also requires least re-work)?

With regards to modeling the entire application, I have seen examples where an ES type would be organized by, say Users, so the model may look something like this:

User
  User ID, Name, etc...
  ProfileSettings
     Setting1, Setting2, etc...           
  Geographies
     GeoID, GeoName
  Projects
     ProjectID, ProjectName
     ProjectContent
        Key (UserID:ProjectID:ProjectContentID), GeoName, Title, Description, Timestamp

So this looks like the whole web application could run off of one index/type. A bit scary, no?

I would like to use Kibana and other analysis tools in the future, and have read about data modeling limitations like not using parent/child types.

What is would a good ElasticSearch data model look like for something like this?

Another way of asking would be, how would one model a live web application using ElasticSearch, and/or would it be better to store user configs and profiles in a separate RDBMS?

Thank you.


Solution

  • These questions are always difficult to answer without understanding the business and the reporting requirements. But here are a couple guidelines I learned from my admittedly brief experience with ES:

    1) You don't have to put it all in one index, so separate indexes for "user" and "project" may work best. Since ES indexes all fields by default, searching a project index by user will be fast. Kibana can search multiple indexes.

    2) The prevailing wisdom at the time was to keep the indexes as flat as possible, so same thing applies to having a separate index for user profile settings.

    3) It may be advantageous to create a mapping, in addition to serializing and stuffing.

    Regarding user configs and profiles, I don't see any compelling reason to use a RDBMS. They'll be keyed by user id with no join requirements, will not require the ACID consistency and concurrency model. A NOSQL solution will give you the schema flexibility those use cases demand.