I've been struggling lately with a complex SQL query.
I have the following tables:
[dbo].[User] ~ {ID,nickname}
[dbo].[Property] ~ {ID,title}
[dbo].[Property_Values] ~ [ID,propertyID,title}
[dbo].[Property_Values_User_Linkage] ~ {UserID,PropertyID,valueID}
It's basically a project in which a user chooses values for each property. Each property can be single-value or multi-value. For instance, user might choose multiple values for Property {ID=1,title=Hobbies} but must choose a single value for Property {ID=2,title=HairColor}.
Using another table - [dbo].[Search_Property_Values_User_Linkage] - {UserID,PropertyID,valueID} I'm choosing what properties I want and I expect to find matching users. BUT, If I haven't chosen a value (or multi-value) for, let's say HairColor, I should get all users (since I don't want to filter by HairColor).
So far it's easy, but the problem I can't seems to solve is the case where there are multi-values ahead or no user defined values. For example, I want all users with HairColor=Brown and Hobbies IN(basketball,football).
I can retrieve all users who match one of the terms (and those who have other properties, since I haven't chosen to filter them) but I can't get only the users who absolutely match my criteria.
To put code into words, let's just say I need all users who:
I came across with a solution in which I create a virtual table which "completes" non-chosen values by bits. For instance (not actual code):
DECLARE @isMatches bit
SET @isMatches=0
if [propertyIsChosen]=1
{
if [userInProperty]=1 SET @isMatches=1
}
else SET isMatches=1
I basically CROSS-JOIN [Property] WITH [User]
and LEFT-OUTER-JOIN the rest of the tables to match selections.
I get all users and their match to a property. This is not good enough since I get users who have Brown hair, those who like basketball/football, but not those who matches to both (and of course any other undefined property by me).
This is heavy, but it's the what I got so far to analyze the problem.
I would appreciate any help. I think I'm missing something from my math classes, 10 years ago...
EDIT: db pic: http://i51.tinypic.com/2n1cfwg.png
From you comment, I see that you have some other property-related tables that you have not mentioned in your question. I'm gonna assume we can ignore these for now and just concentrate on representing property values, which can be done by a simplified model below:
DDL SQL (MS SQL Server):
CREATE TABLE [USER] (
USER_ID int NOT NULL,
NICKNAME nvarchar(50) NOT NULL,
CONSTRAINT PK_USER PRIMARY KEY CLUSTERED (USER_ID)
);
CREATE TABLE USER_PROPERTY_VALUE(
USER_ID int NOT NULL,
PROPERTY_NAME nvarchar(50) NOT NULL,
PROPERTY_VALUE_NO int NOT NULL,
PROPERTY_VALUE nvarchar(255) NOT NULL,
CONSTRAINT PK_USER_PROPERTY_VALUE PRIMARY KEY CLUSTERED (
USER_ID,
PROPERTY_NAME,
PROPERTY_VALUE_NO
)
);
ALTER TABLE USER_PROPERTY_VALUE ADD CONSTRAINT FK_USER_PROPERTY_VALUE_USER FOREIGN KEY(USER_ID)
REFERENCES [USER] (USER_ID);
A multi-value is represented by several rows in USER_PROPERTY_VALUE, sharing the same PROPERTY_NAME but each having a distinct PROPERTY_VALUE_NO.
The following example data...
Rob: HairColor={brown}, Hobby={basketball,football}
Bob: HairColor={brown}, Hobby={basketball}
Steve: Hobby={basketball,football}
...is represented in the database like this:
USER:
USER_ID NICKNAME
1 Rob
2 Bob
3 Steve
USER_PROPERTY_VALUE:
USER_ID PROPERTY_NAME PROPERTY_VALUE_NO PROPERTY_VALUE
1 HairColor 1 brown
1 Hobby 1 basketball
1 Hobby 2 football
2 HairColor 1 brown
2 Hobby 1 basketball
3 Hobby 1 basketball
3 Hobby 2 football
Select users with brown hair color:
SELECT *
FROM [USER]
WHERE
EXISTS (
SELECT *
FROM USER_PROPERTY_VALUE
WHERE
USER_PROPERTY_VALUE.USER_ID = [USER].USER_ID
AND PROPERTY_NAME = 'HairColor'
AND PROPERTY_VALUE = 'brown'
)
Result:
USER_ID NICKNAME
1 Rob
2 Bob
Select users whose hobbies include both basketball and football:
SELECT *
FROM [USER]
WHERE
EXISTS (
SELECT *
FROM USER_PROPERTY_VALUE
WHERE
USER_PROPERTY_VALUE.USER_ID = [USER].USER_ID
AND PROPERTY_NAME = 'Hobby'
AND PROPERTY_VALUE = 'basketball'
)
AND EXISTS (
SELECT *
FROM USER_PROPERTY_VALUE
WHERE
USER_PROPERTY_VALUE.USER_ID = [USER].USER_ID
AND PROPERTY_NAME = 'Hobby'
AND PROPERTY_VALUE = 'football'
)
Result:
USER_ID NICKNAME
1 Rob
3 Steve
Select users whose hair color is brown and hobbies include both basketball and football.
SELECT *
FROM [USER]
WHERE
EXISTS (
SELECT *
FROM USER_PROPERTY_VALUE
WHERE
USER_PROPERTY_VALUE.USER_ID = [USER].USER_ID
AND PROPERTY_NAME = 'HairColor'
AND PROPERTY_VALUE = 'brown'
)
AND EXISTS (
SELECT *
FROM USER_PROPERTY_VALUE
WHERE
USER_PROPERTY_VALUE.USER_ID = [USER].USER_ID
AND PROPERTY_NAME = 'Hobby'
AND PROPERTY_VALUE = 'basketball'
)
AND EXISTS (
SELECT *
FROM USER_PROPERTY_VALUE
WHERE
USER_PROPERTY_VALUE.USER_ID = [USER].USER_ID
AND PROPERTY_NAME = 'Hobby'
AND PROPERTY_VALUE = 'football'
)
Result:
USER_ID NICKNAME
1 Rob
Etc, etc...
Let's say you have a table that contains the filter criteria:
CREATE TABLE PROPERTY_FILTER (
PROPERTY_NAME nvarchar(50) NOT NULL,
PROPERTY_VALUE nvarchar(255) NOT NULL,
CONSTRAINT PK_PROPERTY_FILTER PRIMARY KEY (PROPERTY_NAME, PROPERTY_VALUE)
)
The following query will return only users that fulfill the criteria currently contained in that table:
SELECT *
FROM [USER] U
WHERE
NOT EXISTS (
SELECT F.PROPERTY_NAME, F.PROPERTY_VALUE
FROM PROPERTY_FILTER F
EXCEPT
SELECT P.PROPERTY_NAME, P.PROPERTY_VALUE
FROM USER_PROPERTY_VALUE P
WHERE P.USER_ID = U.USER_ID
)
In plain English: if there is a filter property that is not also users's property, ignore that user.
BTW, this will not work in concurrent multi-user environment - you'll need to introduce an additional field in the PROPERTY_FILTER table to identify the "session", or use temporary tables (if you don't require persistence).