Search code examples
javascriptalgorithmsearchmining

Algorithm to mine millions of records


I have more than a million chat records of data in the format of

chat_message
city
timestamp

Now, we need to check for keywords related to travel like "travel" or "accomodation" or "hotels" etc. Let us say we have gathered around 15 keywords related to travel.

Requirement is to mine the chat message related to travel using the keywords. how?

Solution I can think of - Have an array of travel related keywords. Then scan through all the messages for each keyword(some string matching algo).

I think the solution is pretty brute force, any more ideas on a more efficient algo to search, or set up of the chat-records or/and keywords?


Solution

  • You mileage may vary.

    If your host language is JavaScript, I recommend you to use some full-text search engine, such as lunrjs.It requires pre-processing your raw data, for example, tokenization, stemming and indexing. And then you can search data more conveniently.

    Still, your data set is quite large, at least for browsers(since you are using JavaScript). If you are going to implement this on client side, many details other than algorithm need to be taken into consideration. Memory allocation, data transferring, not to list.

    However, if you are on server side, more mature solutions like ElasticSearch worth your consideration.