sql t-sql entity-framework-4.1 ef-code-first cursor

Convert multiple rows into single column

I have a database table, UserRewards that has 30+ million rows. In this row, there is a userID, and a rewardID per row (along with other fields).

There is a users table (has around 4 million unique users), that has the primary key userID, and other fields. For performance reasons, I want to move the rewardID per user in userrewards into a concatenated field in users. (new nvarchar(4000) field called Rewards) I need a script that can do this a fast as possible.

I have a cursor which joins up the rewards using the script below, but it only processes around 100 users per minute, which would take far too long to get though the around 4 million unique users I have.

 set @rewards = ( select REPLACE( (SELECT rewardsId AS [data()] from userrewards
 where UsersID = @users_Id and BatchId = @batchId
       FOR XML PATH('')  ), ' ', ',') )

Any suggestions to optimise this? I am about to try a while loop so see how that works, but any other ideas would be greatly received.

EDIT:

My site does the following:

We have around 4 million users who have been pre assigned 5-10 "awards". This relationship is in the userrewards table.

A user comes to the site, we identify them, and lookup in the database the rewards assigned to them.

Issue is, the site is very popular, so I am having a large number of people hitting the site at the same time requesting their data. The above will reduce my joins, but I understand this may not be the best solution. My database server goes upto 100% CPU usage within 10 seconds of me turing the site on, so most people's requests timeout (they are shown an error page), or they get results, but not in a satisfactory time.

Is anyone able to suggest a better solution to my issue?

Solution

There are several reasons why I think the approach you are attempting is a bad idea. First, how are you going to maintain the comma delimited list in the users table? It is possible that the rewards are loaded in batch, say at night, so this isn't really a problem now. Even so, one day you might want to assign the rewards more frequently.

Second, what happens when you want to delete a reward or change the name of one of them? Instead of updating one table, you need to update the information in two different places.

If you have 4 million users, with thousands of concurrent accesses, then small inconsistencies due to timing will be noticeable and may generate user complaints. A call from the CEO on why complaints are increasing is probably not something you want to deal with.

An alternative is to build an index on UserRewards(UserId, BatchId, RewardsId). Presumably, each field is few bytes, so 30 million records should easily fit into 8 Gbytes of memory (be sure that SQL Server is allocated almost all the memory!). The query that you want can be satisfied strictly by this index, without having to bring the UserRewards table into memory. So, only the index needs to be cached. And, it will be optimized for this query.

One thing that might be slowing everything down is the frequency of assigning rewards. If these are being assigned at even 10% of the read rate, you could have the inserts/updates blocking the reads. You want to do the queries with READ_NOLOCK, to avoid this problem. You also want to be sure that locking is occurring at the record or page level, to avoid conflicts with the reads.