BizTalk Dynamic Ports vs. Static Ports – Performance
(from the trenches…)
During our implementation of a Microsoft ESB 2.0 solution, we attempted to follow the original design of ESB 2.0 in that we use dynamic adapters/ports for the outgoing transport mechanism when sending data to various endpoints.
Overall the design was elegant in my humble opinion. It successfully sent data in various formats to various endpoints using transport protocols such as MSMQ, HTTP, TCP. It communicated with SQL databases, Oracle Databases, and even custom endpoints with unique TCP based protocols which communicated to MITEL PBX and Voice mail systems. The design basically utilized the Business Rules Engine to dynamic determine which transformation/map to use, and which endpoint to communicate with be it 1-way communication, 2-way communication or some awkward 2-way async-correlated response.
In the initial development and Q&A phases, all appeared to work well and fine. Of course not withstanding the normal headaches of an integration solution with vary sparse requirements, such as the famous “Oh that system doesn’t send that type of message, and this system doesn’t need to process the other one…”. For the most part, the system worked as designed.
Lo and behold we move into a quasi-stress test/performance test environment where the normal load, and stress load is placed on the solution…. what do we get???
We Learn that dynamic adapters / ports are not what they’re hyped up to be!!!
Especially when using them with the WCF Custom Adapter, or WCF LOB Adapter framework, and this includes the WCF-SQL, WCF-Oracle. The issues we ran into were “TimeOut” issues. Be they SQL timeouts, MSMQ Timeouts, Transaction Timeouts, or just plain transmission failures communicating with the endpoint in question. These “Timeout” issues were noticed especially on the WCF adapters, and then noticed on some of the non WCF adapters such as MSMQ, and SQL (non WCF). Now before I get into the specifics, let me explain more about our environment, and solution…
Our environment was not best, but it surely was not worst of environments either. We had 1 server running inside a VM Ware VM with an allocated 3.5 gb of RAM. The Physical Server was an HP DSX server physically with at least 4 2ghz processors, not sure of exactly how much physical RAM, however enough to power roughly 50 different vm images with at least 2gb ram each…. However only 1 processor was allocated to the VM at the time. The MessageBox ran on another VM with the same allocated RAM and a different processor was allocated to it, however running on the same physical server. Disk space for both systems was well over 40gb available on both systems. Both SQL and BizTalk were running within the same AD Domain, thus no crazy configurations in that department. We used the default out of the box configuration for BizTalk and SQL, as this is the start of the Optimization phase of our solution… The Endpoints we were sending to, did not have the latest software necessarily, nor service packs, sometimes we communicated with SQL2000 servers, SQL 2k5, Windows 2003 MSMQ transactional queues, IIS 6.0 Asmx web services, and Raw TCP sockets.
The Solution was a simple Messaging only solution, no orchestrations. It starts by messages are retrieved or received in from one of the endpoints, such as a MSMQ Queue or WCF Web Service. The ESB determines the path of the message through a business rule. The ESB then determines the endpoint to route to through a business rule (BRE). The ESB then determines the transformation to convert the message to,for sending to the selected endpoint.
In development and the initial Q/A, all endpoints passed with small challenges here and there. However, when placing the normal load, timeouts started occurring upon sending to SQL and MSMQ outgoing endpoints. We first noticed the issue when sending a load of about 5000 xml messages of an average size of about 3kb into the BizTalk system. What we saw was that the ESB processed the message fine, transformed it, and sent it to the internal dynamic send port queue. It was then the job of the dynamic send port to take the message and actually send it to the corresponding endpoint. After the first 1100 or so messages, things appeared to work fine, however at a certain point, “Timeouts”.
Thus we took a look at the inner workings of the adapter in question and noticed especially for WCF Custom when sending to a transactional based endpoint, there is a major performance overhead as the adapter must determine how to build the channel stack when sending messages. That’s when we ran into a little advice from the MS WCF LOB team: http://blogs.msdn.com/mdoctor/archive/2009/12/18/performance-tip-when-using-wcf-custom-with-dynamic-send-ports-and-custom-bindings-on-biztalk-server-2009.aspx?CommentPosted=true#commentmessage
We applied this setting, as shown by one of my developers here: http://geekswithblogs.net/BizTalkUnleashed/archive/2010/03/31/configure-enabletransaction-and-isolationlevel-property-in-business-rules-for-dynamic.aspx
However, no success. We did see a performance gain, instead of timing out at 1100 or so messages, we got it up to about 1700 or so. We started to wonder if it was really the dynamic adapter.
We decided to run the exact same messaging scenario, however this time with Static ports. The first test was to just create a static port with the same filter conditions as the dynamic port, use the same send pipeline configuration and the exactly same Endpoint Resoluted adapter values etc. As we ran the solution, SUCCESS!!!. No timeouts, No errors, No issues, No warnings, and what appeared to be 5/6 messages being sent through per second, dynamic dramatically increased by 2 – 3 times the amount, in other words 15-20 messages per second. I’d say pretty good, without any throttling settings, all out of the box behavior. For those of you saying what about the baseline… we continually asked for a baseline for the hardware, which we were never provided, nor allowed to do ourselves thus we really don’t know if this is GOOD metrics according to the hardware or not, however, it’s acceptable for out client’s requirements- which is Good enough- ya know!!!
By the way, we also saw timeouts with MSMQ o.o.b (out of the box), WCF-SQL o.o.b and SQL o.o.b adapters when using them in dyanmic ports.
So MS what gives? An explanation would be nice here, besides the WCF create binding elements weak excuse. I welcome any advice, recommendation, or explanations.