I have a pet project (a simple forum application) that I use to test out all the latest .NET tech and I recently got around to toying with Entity Framework Code-First. This app already had an existing EF solution with an EDMX file mapped to an existing database and all my entities were auto-generated. This solution has worked great so far.
Note: Keep in mind that this change to EF 4.1 is purely for learning. If you are wondering what my needs were that caused me to upgrade, there weren't any. I simply wanted to do it for fun.
I copied the project and did the upgrades so I would have the same project but with different Entity Framework implementations. In the new project I used a Visual Studio extension called Entity Framework Power Tools to generate POCOs and a DbContext from my existing database. Everything worked flawlessly. I had the app compiling in about 30 minutes time. Pretty impressive.
However, I noticed now when running the app that the query execution is approximately 3 times slower than it was before. Any idea what I could have missed?
Below are the details for both solutions, as well as LINQPad measurements for both. (click images for full size)
Here is a snapshot of my EF 4.0 data model. It cuts off a few entities on top and bottom but you get the idea.
http://www.codetunnel.com/content/images/EF41question/1.jpg Here is a LINQPad test against my EF 4.0 data model.
http://www.codetunnel.com/content/images/EF41question/2.jpg Notice that the query took 2.743 seconds to execute.
Here is a snapshot of my EF 4.1 data model. Since it's code-only I will show the DbContext class as well as one of the mapping classes (fluent API code) for one entity, and one entity itself.
DbContext http://www.codetunnel.com/content/images/EF41question/3.jpg TopicMap (fluent API configuration) http://www.codetunnel.com/content/images/EF41question/4.jpg Topic (POCO entity) http://www.codetunnel.com/content/images/EF41question/5.jpg Here is a LINQPad test against my EF 4.1 model.
http://www.codetunnel.com/content/images/EF41question/6.jpg Notice this time that the query took 6.287 seconds to execute and it was the exact same query. It takes over 30 seconds the very first time it is run. If I go to the SQL and IL tabs in LINQPad the generated SQL and the IL code are identical for both data models. This is really giving me grief. In the actual application things are so slow with EF 4.1 that it is unusable.
I ran the same LINQ query against both models. The query grabs all topics for a regular forum user, orders them in descending order by their last reply date (or the topic post date if there are no replies).
Obviously I can just go back to EF 4.0 and go about my merry way but I'm really interested if there might be something I missed.
I'm completely revisiting this answer because of some recent developments.
Because of some inquiry by the Entity Framework team at Microsoft trying to duplicate my issue, I went back and retraced my steps to better help narrow down the problem. It's been a while since I asked this question and I understand things much better now than I did then.
Rather than go back and try to get some very old code running I decided to start from scratch with a simple test project. I put together a simple database with two tables and mapped them to an EF 4.0 designer file.
This generated a connection string like this:
<add name="EFTestEntities" connectionString="metadata=res://*/Entities.csdl|res://*/Entities.ssdl|res://*/Entities.msl;provider=System.Data.SqlClient;provider connection string="data source=.\sqlexpress;initial catalog=EFTest;integrated security=True;multipleactiveresultsets=True;App=EntityFramework"" providerName="System.Data.EntityClient" />
I then populated the database with 1000 rows of test topics and 10 rows of replies for each topic. Once I had this working I timed a very basic query fairly similar to the one in my main question. Then I duplicated the test project and I modified it using the Entity Framework Power Tools extension to generate my model objects and DbContext. The only thing I modified was the connection string to remove the metadata that is referenced when there is a designer file in the project so it looked like this:
<add name="EFTestContext" providerName="System.Data.SqlClient" connectionString="Data Source=.\sqlexpress;Initial Catalog=EFTest;Integrated Security=True;Pooling=False" />
I then ran the exact same query as I did with the designer.
There was no difference in query times except for the slightly extra time it takes for the code-first to generate the mapping meta-data. After that initial query the two versions of EF performed pretty much the same. I was about to resolve the problem as not reproducible but then I notice I did something horrible in the question. I called .AsEnumerable()
before my queries. If you don't already know what that does, that will cause the ENTIRE entity collection to be pulled into memory and then the query would be applied there as LINQ-to-Objects rather than LINQ-to-Entities.
This means that I was sucking an entire table into memory and then doing LINQ against it there. In cases where SQL server is on the same machine as your website you might not notice the difference but there are many cases where this would be a huge issue. In my case it really was causing a performance loss.
I went back to my tests and I ran them with .AsEnumerable()
placed before the queries.
Now I expected the time to be slower since my LINQ queries weren't being translated into expression trees and executed in the database. However, it seems I did reproduce the issue in my question. The code-only version is returning much slower. This is actually pretty strange because they both should be running the same. I am not surprised that they are running slower than when the queries were against IQueryable, but now that they are running against IEnumerable there is a big difference between the two. I was able to expand the difference between the two by adding more and more data to the table.
I went ahead and added 5000 more topics to the database, with 30 replies for each topic. So there is now a total of 6000 topic rows and 165000 reply rows. First I ran the query with proper LINQ-to-Entities:
As you can see, still no difference. Then I ran the queries against with LINQ-to-Objects using .AsEnumerable()
.
I stopped it after three queries because waiting about two minutes per query was excruciating. I can't seem to produce the 3x as slow issue that I show in my question, but code-only is significantly slower. The EDMX approach takes just shy of two minutes to complete one query while the code-only approach consistently takes over two minutes.