WF workflow for a web user across a server farm?

We would like to use Windows Workflow to guide our web users down a navigational path.

We have a server farm, so this translates to a long-running workflow for each user; which multiple servers must be aware of.

What we've prototyped so far is the use of WorkflowApplication and the SQL Persistence Provider. This works, but we're concerned about all of the SQL Server activity and its performance.

We found that someone has created an AppFabric Cache persistence provider (blog post here), which appeals to us because it means hydrating our workflows to RAM instead of SQL; but they clearly state that it's not been production-tested.

Any advice, thoughts, or suggestions?

Solution

Unfortunately, as with most performance related questions, there's no good answer to this other than: test it out with your real world scenario and see.

We're using pure SQL persistence for a shopping cart style workflow and have no issues with its performance. We even extract several promoted properties which adds some storage overhead. However, my load is not your load and my workflow is not your workflow, so it doesn't mean it will work as well for your scenario.

The one word of advice I will give for a web farm scenario is that you make sure to set the sqlWorkflowInstanceStore/@instanceLockedExceptionAction attribute to AggressiveRetry. The reason for this is that in a farm scenario usually you have the pattern of requests being routed to diff. servers as they come in. So if two calls come in in rapid succession, the first request for the workflow may go to server A and the second to server B. Server A might have responded, but since persistence is asynchronous to the response it might still be persisting when the second call comes in to Server B. Server B might not get the lock the first time because Server A is finishing up persistence so then it will go into a retry routine. If you don't specify AggressiveRetry, then BasicRetry will be used and that uses a really slow, linear algorithm which will manifest in what looks like horrible service performance for your workflow. AggressiveRetry on the other hand will use an algorithm that "backs off" the more failures it gets. Since odds are Server A is going to finish up real soon, you'll usually aquire the lock within the first few tries.

As far as the AppFabric Cache implementation goes, I wouldn't personally use that exact implementation as it does not guarantee any durability. The cache could go down, the cache might need to purge its LRU objects which could include your workflow instance, etc. If anything, I'd recommend a cache read-through implementation where all writes/lock calls are done against SQL, but the actual data for deserializing the workflow could come out of the cache. That obviously won't have as high an impact, but... still better than having to read from the SQL server. Another possibility is that AppFabric Cache 1.1 added cache read/write-through providers. So now you could theoretically use the implementation you linked above with a cache read/write-through provider implementation that actually makes the SQL storage calls underneath the covers and possibly have the best of both worlds from the perspective of the application since it would only ever deal with the cache and the provider would be asynchronously writing through to the SQL store.