Can someone suggest how to solve the below search problem easily, I mean is there any algorithm, or full text search will be suffice for this?
There is below classification of items data,
ItemCategory | ItemCluster | ItemSubCluster | SubCluster | Items |
---|---|---|---|---|
Vegetable | Root vegetables | Root | WithOutSkin | potato, sweet potato, yam |
Vegetable | Root vegetables | Root | WithSkin | onion, garlic, shallot |
Vegetable | Greens | Leafy green | Leaf | lettuce, spinach, silverbeet |
Vegetable | Greens | Cruciferous | Flower | cabbage, cauliflower, Brussels sprouts, broccoli |
Vegetable | Greens | Edible plant stem | Stem | celery, asparagus |
The inputs will be some thing like,
sweet potato, yam
Yam, Potato
garlik, onion
lettuce, spinach, silverbeet
lettuce, silverbeet
lettuce, silverbeet, spinach
From the input, I want to get the mapping of the input items those belongs to which ItemCategory, ItemCluster, ItemSubCluster, SubCluster.
Any help will be much appreciated.
You are nearly following the right approach.
You don't need full text searching here.
What you can create here is a kind of inverted index as follows:
If we take example of potato
, create a map for potato
storing what is its ItemCategory, ItemCluster, ItemSubCluster, SubCluster.
For example -
"potato": {
"ItemCategory": "Vegetable",
"ItemCluster": "Root vegetables",
"ItemSubcluster": "Root",
"Subcluster": "Without Skin"
}
Now, to store this kind of data for each vegetable would be expensive.
You can optimise the storage by using an encoding scheme:
For example -
let ItemCategory
be denoted by 0
,
let ItemCluster
be denoted by 1
,
let ItemSubcluster
be denoted by 2
,
let Subcluster
be denoted by 3
and the values be denoted by a similar encoding scheme:
let Vegetable
be denoted by 0
,
let Root vegetables
be denoted by 1
,
let Root
be denoted by 2
,
let Without Skin
be denoted by 3
Now, your mapping becomes:
"potato": {
"0": "0",
"1": "1",
"2": "2",
"3": "3",
}
To further optimise this, you can also make maintain an index of vegetables. For example, potato
can be denoted by 0
.
So your final index becomes:
"0": {
"0": "0",
"1": "1",
"2": "2",
"3": "3",
}