I am building a graph search and display system for a huge dataset (all companies and company partner's from a country, so people can find relationships between owners and unrelated companies), including some 50M names and IDs (think a censored SSN).
I managed to load all this using Memgraph, using some 32GB RAM, but the search by person/company name is awful, taking some 30s on average. Once found, the database brings the associated nodes quickly (<5s).
I am thinking on migrating all the text data (company and partner names, company and person IDs) to Elasticsearch, since it is specialized on text search, and keep only numeric node ids and relationships in Memgraph, storing the associated node id on Elasticsearch.
I am doing all this in a development machine, with 48GB RAM, so this is my limit right now.
My questions are:
What is the expected search time for a dataset of names this size? And how about RAM comsumption?
I'm talking about Elasticsearch because it seems the most known/mature system for what I want to achieve, but I'm a bit cautious because of it being Java based. I searched for alternatives and came up with Meilisearch (Rust based) and Manticoresearch (c++ based), are these any good?
Or, would I be better served by a relational DB with text search support (Postgres + pg_trgm or pg_bigm, for example)?
TIA.
DISCLAIMER: I'm the co-founder and CTO of Memgraph
Memgraph doesn't have excellent text search capabilities at the moment, but there are some options:
=~
operator -> https://memgraph.com/docs/querying/read-and-modify-data#regular-expressions NOTE: if you have data indexed (by creating label / label-property index), the search might be fasterIMPORTANT: We had quite a few requests like this one (native support for text search indexes because it's more efficient to have everything in one place), so we plan to release proper text search indexing capabilities, maybe even in v2.13 (~10 weeks from today) -> please follow the progress under https://github.com/memgraph/memgraph/issues/1261