Search code examples
mysqlsqldatabasemysql-5.6

Getting Wrong Results when using != instead of <


I'm a beginner in SQL and I was going over exercise questions on Stanford Lagunita and encountered an odd behavior where I get different results when I use != instead of < for values that are not equal.

Here's the question:

"For all cases where the same reviewer rated the same movie twice and gave it a higher rating the second time, return the reviewer's name and the title of the movie."

Here's the Schema:

/* Delete the tables if they already exist */
drop table if exists Movie;
drop table if exists Reviewer;
drop table if exists Rating;

/* Create the schema for our tables */
create table Movie(mID int, title text, year int, director text);
create table Reviewer(rID int, name text);
create table Rating(rID int, mID int, stars int, ratingDate date);

/* Populate the tables with our data */
insert into Movie values(101, 'Gone with the Wind', 1939, 'Victor Fleming');
insert into Movie values(102, 'Star Wars', 1977, 'George Lucas');
insert into Movie values(103, 'The Sound of Music', 1965, 'Robert Wise');
insert into Movie values(104, 'E.T.', 1982, 'Steven Spielberg');
insert into Movie values(105, 'Titanic', 1997, 'James Cameron');
insert into Movie values(106, 'Snow White', 1937, null);
insert into Movie values(107, 'Avatar', 2009, 'James Cameron');
insert into Movie values(108, 'Raiders of the Lost Ark', 1981, 'Steven Spielberg');

insert into Reviewer values(201, 'Sarah Martinez');
insert into Reviewer values(202, 'Daniel Lewis');
insert into Reviewer values(203, 'Brittany Harris');
insert into Reviewer values(204, 'Mike Anderson');
insert into Reviewer values(205, 'Chris Jackson');
insert into Reviewer values(206, 'Elizabeth Thomas');
insert into Reviewer values(207, 'James Cameron');
insert into Reviewer values(208, 'Ashley White');

insert into Rating values(201, 101, 2, '2011-01-22');
insert into Rating values(201, 101, 4, '2011-01-27');
insert into Rating values(202, 106, 4, null);
insert into Rating values(203, 103, 2, '2011-01-20');
insert into Rating values(203, 108, 4, '2011-01-12');
insert into Rating values(203, 108, 2, '2011-01-30');
insert into Rating values(204, 101, 3, '2011-01-09');
insert into Rating values(205, 103, 3, '2011-01-27');
insert into Rating values(205, 104, 2, '2011-01-22');
insert into Rating values(205, 108, 4, null);
insert into Rating values(206, 107, 3, '2011-01-15');
insert into Rating values(206, 106, 5, '2011-01-19');
insert into Rating values(207, 107, 5, '2011-01-20');
insert into Rating values(208, 104, 3, '2011-01-02');

My Working solution:

SELECT Reviewer.name, Movie.title
FROM Rating r1, Rating r2, Movie, Reviewer
WHERE Reviewer.rID = r1.rID and Reviewer.rID = r2.rID and 
      Movie.mID = r1.mID and Movie.mID = r2.mID and 
      r1.rID = r2.rID and r1.mID = r2.mID and 
      r1.ratingDate < r2.ratingDate and
      r2.stars > r1.stars

Now, if you look at the schema, you'll see that reviewers with duplicate reviews on same movies (only reviewers rID: 201 and 203) have different ratingDates, so if I change r1.ratingDate < r2.ratingDate to r1.ratingDate != r2.ratingDate, result will include Brittany Harris (rID:203) as well which is incorrect.

Could anyone tell me why this happens?

Thanks


Solution

  • r1.ratingDate < r2.ratingDate and r2.stars > r1.stars 
    

    means: return those which rated a second time and gave more stars.

    if you don't enforce the date order (!=) then the other record of the pair, which is generated as cross product by the join, will lead to returning Brittany Harris. Try and set both to !=, you will see all pairs of records of different ratings on different times.