i have problems with regards to indexing item names with numbers and symbols. a sample of my data is shown below:
ANGLE BARS ORANGE - 4.0MM 2 - 1/2"
B.I SQUARE TUBING 2" X 3"
B.I. PIPE S-40 10MM 3/8"
B.I SQUARE TUBING 1" X 2"
PLYWOOD MARINE 3/4X4X8
PLYWOOD STA. CLARA 1/8X4X8
PLYWOOD STA. CLARA 3/16X4X8
i want to tokenize my data in white or trailing spaces without dropping the symbols because these symbols are very essential. so that whenever i search for "plywood sta. clara", "b.i square 2" X 3"", or "angle orange 2 - 1/2" will give me a result. i tried to used whitespace analyzer but the symbols are dropped. i also tried standardanalyzer but stop words and symbols are also dropped. what is the best analyzer to use instead?
You can use PatternAnalyzer by writing regular expression or create Custom Analyzer.