I'm honestly not sure how to phrase this. So I'm just going to explain the issue
In order to find as many possible matches as possible, I currently run a piece of code that converts data to upper case, removes spaces and invalid characters. For example:
Hi there, I'm very happy! -> HITHEREIMVERYHAPPY
My database is reaching a size where it's no longer efficient to run the piece of code that checks each line in a ResultSet
, so I was wondering if this is possible in SQL
somehow?
SELECT * FROM Data WHERE Input.replaceAll("[^a-zA-Z0-9]", "").replace(" ", "").toUpperCase() = INPUT AND Response.replaceAll("[^a-zA-Z0-9]", "").replace(" ", "").toUpperCase() = RESPONSE
You might like RLIKE for this.
-- Sample data create table Data ( Id int primary key auto_increment, Input varchar(100), Response varchar(100) ); insert into Data (Input, Response) values ('Hi there, I''m very happy! ->', 'To have 1 solution.') , ('HITHEREIMVERYHAPPY', 'ToHave1Solution') , ('I rlike this', 'IRLikeThis')
Query:
SELECT * FROM Data WHERE Input RLIKE '[^ A-Za-z0-9]' AND Response RLIKE '[^ A-Za-z0-9]'
Result:
Id | Input | Response -: | :--------------------------- | :------------------ 1 | Hi there, I'm very happy! -> | To have 1 solution.
db<>fiddle here
It basically uses the regex [^ A-Za-z0-9]
, which matches strings that contain any character that's not a letter, digit or space.