Digging into Windows Server AppFabric tracking event collection service

There is no enough documentation for configuration of AppFabric Event collection service. The best doc I can find is this MSDN page:
http://msdn.microsoft.com/en-us/library/hh334438

We would like to fine control when workflow tracking events are being persisted. how ever, the document does not provide details of how extactly events are schduled to be persisted to the monitor datastore. The main configurable settings are the “eventBufferSize” and “RewriteDelay” attributes in root web.xml:

<collectors>
   <collector name="" session="0">
        <settings retrycount="10" eventbuffersize="10000" retrywait="00:00:15" maxwritedelay="00:00:05" aggregationenabled="true">
</settings></collector>
</collectors>

where the description of the attributes are:

  • eventBufferSize: Maximum number of events the collector buffers before writing them to the store.
  • maxWriteDelay: If no event has arrived in this time period then events are written to the store. The collector may choose to write events even if events have arrived during this time period.

How ever, it turns out the document has a bug and “maxWriteDelay” attribute is not supported in AppFabric 1.1 version. The equivalent in AppFabric 1.1 is the “samplingInterval” attribute.

By setting these values in root web.config, we can control some of the behaviours, but still can’t guarantee when the events will be written to the store. for example, if we set the samplingInterval to 100 ( which is the minimum value allowed) and samplingInterval to 00:00:05 ( 5 seconds, the minimum interval allowed), it still takes up to 1 minute for some events to be written to the appfabric monitoring DB.

So I decided to find out what exact logic is behind this, and the way I did was to decompile the appfabric code. By using some free .net decompiler, I was able to see the source code and here is a summary of how the event collector service determine when to write to the monitoring store:

  1. The following parameters are defined in root web.xml
    • samplingInterval: minimum value 5, max 60; default 5
    • eventBufferSize: minimum value 100; max 32767;default 10000
    • maxBuffers: default5. minimum 3, max 100
  2. the value of attribute “schemaSampingInterval” is defined as part of the attribute validation paramter in the ApplicationServers_schema.xml under windows\system32\inetsrv\config\schemas, as the max value of samplingInterval, and the default value is 60.( this is the schema definitions for the attributes above:
    <attribute name="retryCount" type="int" defaultvalue="5" validationtype="integerRange" validationparameter="0,100">
    </attribute><attribute name="eventBufferSize" type="int" defaultvalue="10000" validationtype="integerRange" validationparameter="100,32767">
    </attribute><attribute name="maxBuffers" type="int" defaultvalue="5" validationtype="integerRange" validationparameter="3,100">
    </attribute><attribute name="retryWait" type="timeSpan" defaultvalue="00:00:15" validationtype="timeSpanRange" validationparameter="10,120,1">
    </attribute><attribute name="samplingInterval" type="timeSpan" defaultvalue="00:00:05" validationtype="timeSpanRange" validationparameter="5,60,1">
    </attribute><attribute name="aggregationEnabled" required="false" type="bool" defaultvalue="true">
    </attribute>

    )

  3. workflow tracking events are stored in some in-memory buffer by the event collector before they are written to monitoring store. any of the following 2 conditions trigger the data store writting ( flushing the buffer):
    • when new event is added, and buffer becomes full: events in buffer will be flushed to store. current buffer will be released into a avlaiable buffer pool. allocate a new buffer from either available buffer pool or create a new buffer if maxBuffer threadshold is not reached.
    • there is a timer job runs at interval of the value of samplingInterval seconds. and every time timer is invoked, it will flush and release the buffer if it meets all the following conditions:
      (1) there are 1 or more events in the buffer
      (2) either
      a. no new events came into the buffer since last time timer job is invoked
      or
      b. it’s more than “schemaSampingInterval” time since the buffer was flushed last time.( it’s actually tracked by comparing number of times the timer job is invoked since it ‘s last flushed to schemaSampingInterval / sampingInterval , so no exactly the value of schemaSampingInterval)

For example, if eventBufferSize is 100, samplingInterval is 00:00:05 , and the schemaSamplingInterval is the default 60, then events will be persisted to monioting store when events reaches 100 in the buffer. they will also be persisted if there are no new events between two 5 second intervals (maximum period of time of no new events coming before buffer is flushed is about 10 seconds, minimum is 5 seconds), or if the buffer has not been flushed for 60 seconds.

So depending on the speed of new events being inserted into the buffers, it can be any time between 0 and 60 seconds for an event to be flushed to the monitoring store.

This leads to a potential way of better controlling schedule of how often the events are persited– in addiotna to the values in root web.config, we can also set the values like schemaSamplingInterval to affect the scheule and from the analysis above we can calculate the relationships. for example, if we want to have the events written to monitoring store more frequently, we can set the validation range for samplingInterval to “2,2,1” in ApplicationServer_schema.xml , which means min and max value for “samplingInterval” attribute are both 2, and also set the value of samplingInterval to 00:00:02 in root web.config.

Before ending this post, I want to talk a little about another piece that we don’t want to miss. when Event Collection Service flushes the buffer and writes to MonitoringDB, it writes to a staging table in monitoring DB. on SQL server there is some SQL agent batch job that’s executed every 10 seconds ( 10 seconds is the minimum interval for any sql agent job in sql 2008) to move all records in staging table to the persistent tracking tables in batch. If we don’t want to wait for up to 10 seconds to see the tracking records in the persistent tables, one work around is duplicate the batch job but run at different schedules…for example, the default job runs at seconds 10, 20, 30… of every minute, while another job runs at seconds 5, 15, 25 etc, which essentially makes the staging table job triggered every 5 seconds…

About: mmpower

Software Architect & Soccer Fan 黑超白袜 = IT 民工 + 摇滚大叔


One thought on “Digging into Windows Server AppFabric tracking event collection service”

  1. The self education industry is a $355 Million per day industry and is expected to TRIPLE in the next 5 years! And with the right strategy this could be your opportunity to:

    Get in early before it is oversaturated.
    Help others go faster while you get paid
    Fix a broken system
    Make an impact on the world
    Get paid for a skill, hobby, passion or expertise you already have (or get paid from someone else’s)

    And Tony Robbins, Russell Brunson and Dean Graziosi are finally going to show YOU how to take advantage of it…

    (Now if you don’t already know who these 3 guys are then you’ll want to listen up!! They’ve impacted millions of lives and have generated billions of dollars)

    Look, if you’ve ever thought about (or even if you never thought about) getting paid for something you know (like a passion or a skill)… or even getting paid from what others know, then you have to attend.

    This is one of the fastest growing industries online and in all areas of the world.

    And for the first time ever they’re doing a webclass training on how to extract your wisdom (or somebody else’s) and get other people to pay you for it!

    You can save your spot here (and when you do register, you’re immediately going to get a brand new, never before seen training from Tony and Dean. No one else will ever get a chance to see this rare training except for us).

    kbbmint.com

    After hearing what they’re teaching, I truly believe this is the #1 way to make a massive income, make a massive impact on the world and leave a legacy!

    And here’s just a fraction of what you’ll learn this week:

    The mostly hidden $129 Billion dollar market and how you can make a massive impact and leave a legacy by profiting from it.

    The KBB Triangle: The 3 ways to profit (whether you’re an expert with a thriving business, just getting started, or even if you’re at complete rock bottom)

    The framework to profit from what you (or someone else) already knows – or by creating community and bringing people together.

    Plus you’ll even be able to ask them questions when they do Q&A at the end of the training. (Chat is limited so show up early).

    There’s no other time where you’re going to have this opportunity. So BE THERE! Because if you’re not going to make a shift in your life now, then when? And if not with these guys, then let me ask with who?

    This is your rare chance to learn from 3 people who started from nothing and went on to generate billions of dollars and build massive companies.

    Spots are filling up faster than they ever have, so you need to hurry and go here to save your spot

    https://kbbmint.com

    Register now and I’ll see you there this week – Plus they did a few brand new “pre-event” trainings that only those who register get access to.

    Talk soon

Leave a Reply

Your email address will not be published. Required fields are marked *