Search code examples
sqlsql-server-2008database-performancesqlperformance

Optimizing my stored procedure - is this the right way to do it?


SELECT  TOP 1 @CurrentStudentID = StudentID
FROM    Courses WITH (NOLOCK)
WHERE   Courses.CourseID = @CourseID 
ORDER BY StudentID

-- Loop through all the students and find if he/she is registered for more than one course.
WHILE (@@ROWCOUNT > 0 AND @CurrentStudentID IS NOT NULL)
BEGIN

    -- Select all other courses student is currently registered in.

    IF @@ROWCOUNT > 0
    BEGIN
        -- return required information
    END
    ELSE
    BEGIN
        -- Perform some operations
    END

    -- Select the next registered student
    SELECT  TOP 1 @CurrentStudentID = StudentID
    FROM    Courses WITH (NOLOCK)
    WHERE   Courses.CourseID = @CourseID AND
            Courses.StudentID > @CurrentStudentID 
    ORDER BY StudentID
END

Can someone help with my logic here? I wrote a stored procedure to find out if a student of a course is currently taking other courses from the same school.

I'm particularly worried about the two SELECT queries and the performance of a while loop if the number of students is huge. I feel the way I am doing it feels very contrived. I'm sure there are better ways to do this.

I've done SQL profiling on this stored procedure and it's duration can range from 0 - 60 ms for a single call. I don't understand why the same stored procedure's execution time is so random and inconsistent.

Appreciate any help. I only have 1 year plus of SQL Server 2008 experience.

Thanks in advance.


Solution

  • AS I mentioned, SQL is a set-based theory language. In other words, it is semi-relational with data sets that allows for efficient comparisons between groups of data. "Lower" languages such as C++ or Java do not maintain such large data sets, since they are cursor (line by line) based-languages.

    High level as this definition is, the point is to think of your data like EXCEL sheets. You have predefined columns such as CourseID and StudentID, that have information in the other columns that are dependent on those values (CourseID 1:1 Course_Name) and some information that is repetitive (CourseID can have multiple students).

    True normalization includes removing interdependent columns, but lets not worry about that right now. The main focus is on what makes sense for the business. Your table has Identifying columns for its courses and students. So you do not need to use a cursor if those values do not have conflicting interdependent values.

    SELECT StudentID, COUNT(COURSEID) AS CLASS_NUM
    FROM   COURSES
    GROUP BY StudentID
    HAVING COUNT(COURSEID) > 1
    

    The GROUP BY returns distinct sets of values from the columns listed, flattening the other rows and allowing aggregate functions like COUNT(). (note: NULLS are not counted in the COUNT(). Use an ISNULL function) You have not yet limited the list, and yet you achieve the same results. After SQL flattens the rows, you can use a HAVING clause to further limit the result sets from the GROUP BY if needed.

    Way faster than a cursor, definitely. :)

    Now, if your table includes students in different semesters and years, you might consider adding this to the GROUP BY, so that you have sets in your GROUP BY (StudentID and Year)

    Also, recall that the SELECT statement LOGICALLY read AFTER the GROUP BY and HAVING clauses, so any columns listed in the SELECT statement must appear in the GROUP BY or or have an aggregate function.