Workflow and SharePoint

Recently our team has come across some Workflow performance issues and I wanted to share our research settings we used to get past it.  But first if you’re new to SharePoint and Workflow, continue reading if this sounds interesting…

SP utilizes WF v3.5 for its workflow feature set. However, SP uses a very constrained subset of WF v3.5. There are a ton of settings and options that make WF very powerful, however to keep the complexity down and in sync with SP, the Microsoft SP Team opted to implement a closed WF Host, as well as a closed WF Runtime.

What does this mean?

WF v3.5 is simply a framework. It is not an application, nor a Server Product, nor an SKU number. It is simply a set of classes, apis, and patterns used to allow other products and technologies to create a state machine based engine. WF comprises of a WF Host, WF Runtime, WF Designer, WF Providers and workflows. The Host, is the runtime time environment which an engine: WF Runtime, runs inside of. All environment settings and values are controlled by the Host. The Runtime is the workflow engine that executes 1 or more workflows. It also manages the communication between Host, and workflow. The Runtime uses queues and providers to channel the communication between the Host and the workflow. A Provider is a bridge that adds various communication stacks and functionality to the WF Runtime. An example of a provider is the WF SQL Persistence Provider. This provider allows the WF Runtime to support persisting state and data into a SQL Database. A Designer is a graphical tool that helps create workflows. A workflow is set of actions or activities in some sequential, or state machine defined pattern. Activities are single operations such as “CreateTask”, or “Print” or “SaveToFile”.

Standard WF v3.5, supports the ability to customize every aspect of WF. A Host, Runtime, Provider, Designer, WF, and Activity can all be customized and created from scratch. Again stressing the point WFv3.5 is a framework, not an full fledge application. SP, implements their own Host, Runtime,  Providers, and Designer (SharePoint Designer). Only allowing Visual Studio .Net ( a supported Designer), workflows, and activities to be customized and created from scratch. This is what is meant by SP is a closed WF system.

So how do WF’s work in SP?

In general, WF’s are a batch of WF Activities which are executed by the WF Runtime. They can execute in a sequential order (Hence a Sequential WF), or in a state machine (hence state machine WF). State Machine WF do not have any particular order but rather execute activities based on transitioning into different states. In other words, if you model 4 states, “OnInitialRequest”, “OnReply”, “OnAnotherRequest”, “OnFinish” there’s really no real particular order in which the logic may execute besides the start and stop states: “OnInitialRequest” and “OnFinish” respectively. Before and After these states, “OnReply” and “OnAnotherRequest” can go back and forth in no particular order until something causes the state to transition to the final finished state: “OnFinish”. What this means is that the work starts by waiting for something to signal the “OnInitialRequest” state, once this event happens, the activities execute and then wait until the next transitional state occurs, which may be, “OnReply” or “OnAnotherRequest” depending on rules, conditions, events and other logic. Thus State Machine WF’s are very powerful, as well complex and are the basis for Human interaction- thus the human workflow.

When WF’s are waiting, they go to sleep: “dehydrate”, and persist its state into the SP SQL Persistence provider. At this point SP can continue processing other WF instances, or consume other SP tasks. When a particular event occurs, such as when a user edits an item in a WF associated list, this triggers the WF to “rehydrate” and load back into memory, and continue executing from its last known state, again executing until it finishes all activities for that state and waits yet again for another transitioning event until complete.

What causes Dehydration?
Dehydration or commit points are caused by a Delayxxx activity or an OnXXX activity. These actions cause a “commit point” also known as a persistence point. As an quick side note,  workflow events are queued because of heavy workflow load, or delay activities that are processed in background jobs. These jobs are run by SPTimer jobs.

What happens in a multi server design?

In an distributed design, one where we have multiple Web Front Ends (WFE), an Application server, and 1 or more database servers, there are numerous servers which can be a SP WF Host and Runtime. Each WFE can be a Host and Runtime, the App server and DB servers can be a Host and Runtime depending on the configuration. What this means is that the WF (Sequential or State) can be rehydrated on any server that receives the “wake-up” event, or even the “initialize” event. In order for this to work successfully, each server must have all the appropriate pre-requisites, assemblies, started services, and configuration to process the workflow. By default, the Application server nor the DB server will not have everything needed, thus careful consideration must be put into place to either “Not” have those servers process workflows, or configure them to allow processing of workflows. To configure the App or DB servers to support require different configuration steps, and its not generally recommended to do so. The app server is an exception to this rule.

Ok, we know about SP WF now, so what options are available to us for Throttling?

First off multiple tests need to be ran paying attention to the following WF performance counters:
o    number of workflow starts per secon
o    number of tasks completed per second
o    number of concurrent workflows

You will also need to pay attention to standard Server counters, CPU utilization, Memory, number of processes etc.

SP WF supports the various FarmLevel properties: WorkflowPostPoneThreshhold (Throttle), WorkflowBatchSize (Activity Batch Size), WorkflowEventDeliveryTimeout (Timeout), SPTimerJob –job-workflow (Workflow Timer Interval)

Throttle (workflowPostPoneThreshhold) – how many wf can be processing at any one time on the entire server farm, (Not how many can be “in progress” concurrently, but how many can be actively using the processor. When exceeded wf instances and events that wake up dehydrated wf are queued. So when a wf instance is started, this number is checked, if this instance exceeds the number, it is queued, SPTimer job is created to run it later. Check it by running Get-SPFarmConfig (WorkflowPostpontThreshold)

Batch Size (workflowBatchSize) – number of wf activities to load and start executing in the SP Timer service concurrently at any given time. Basically tells the DB to retrieve x number of activities that are in the queue for the TImer service to execute, not the wf instances running in the current wfe request (Immediate run) (dehydrated, or currently running) default = 100. Changing this number

Timeout (workfloweventDeliveryTimeout)- The timeout setting specifies the amount of time (in minutes) in which a workflow timer job must complete before it is considered to have stopped responding and is forced to stop processing. Jobs that time out are returned to the queue to be reprocessed later. If too high in the case of Threshold setting, running instances count toward the Threshold, thus it will take longer for the threshold count to go down, thus queing more wf instances as they come in.

Workflow timer interval – The workflow timer interval specifies how often the workflow SPTimer job fires to process pending workflow tasks. This interval also represents the granularity of delay timers within your workflow. If a timer is set to delay for one minute, but the interval timer fires only every five minutes, the workflow delays for five minutes, not one minute.

For performance considerations, if your workflow creates a lot of work items, you can use this setting, in conjunction with the batch size, to control the processing of those settings. For example, with a batch size of 100 (the default) and a timer interval of five minutes (the default), Windows SharePoint Services processes at most 100 work items every five minutes. If the batch of 100 work items for one workflow instance finishes processing in two seconds, your workflow instance is sitting idle for 4 minutes and 58 seconds. This may be acceptable; it may not. Decreasing this interval setting allows more batches to process by causing the timer to fire more often and request more work to do; but it also means that workflow processing consumes more server resources. The minimum value for this setting is 1, which means that the timer will fire every minute.

You can also specify additional information about the schedule and interval for the timer service by providing a schedule string in format shown in Table 1. (Other format strings are available, but their applicability to the workflow environment is questionable.)

Table 1. Formats for SPTimer schedule strings

SPTimer schedule string format  Meaning
“Every 10 minutes between 0 and 30”
Timer fires every 10 minutes from the top of the hour to half past the hour
“Hourly between 9 and 17”
Every hour from 9 A.M. to 5 P.M.
“Daily at 15:00:00”
Timer fires every day at 3 P.M.
“Monthly at 15 15:00:00”
Timer fires on the 15th of every month at 3 P.M.

Great, so how do we set these settings and values?

You can set the Workflow interval by typing in PowerShell:
Get-SPTImerJob job-workflow | Set-SPTImerJob –Schedule “every 10 minutes between 9 and 18”

You can set the other values by typing in a PowerShell command prompt:
$a = Get-SPFarmConfig
$a. workflowPostPoneThreshhold = 200
$a. workflowBatchSize = 10
$a. workfloweventDeliveryTimeout = 2
$a | Set-SPFarmConfig

And you can retrieve them by simply typing:
GET-SPTimerJob job-workflow
Or
GET-SPFarmConfig

Resources
http://msdn.microsoft.com/en-us/library/dd441390.aspx
http://technet.microsoft.com/en-us/library/ee906558.aspx
http://msdn.microsoft.com/en-us/library/ms442249.aspx