Measuring Workflow Performance
Tuning & Optimizing Your Workflow Executions
When tuning a system for performance, there are a lot of factors to look at; but one of the most fundamental questions to ask is, where is the system spending most of its time? In this article we'll look at some of the new performance logging features provided by TRIRIGA, focusing on workflows. We'll also build a utility to pull workflow performance data into the system and report on the results within TRIRIGA itself.
If you've been working with TRIRIGA for awhile, you've probably had to dig through the system logs to isolate issues. Historically, unfortunately, the logs haven't typically yielded much useful information; under the "normal" configuration, the system basically only logged errors, and when configured for debug logging it would spit out huge gobs of unrelated details.
However, in the 2.6 series TRIRIGA has revisited the system logs and given them a much-needed facelift. In the Administration Console, there is a new "Platform Logging" tab which provides granular control over exactly which events are sent to the log. Additionally, these parameters can be tuned on-the-fly; this gives administrators an excellent tool to help in identifying and isolating performance issues.
Workflow Performance Logs
For our purposes, we'll be looking at the settings found under "Workflow Logging"; in particular, the Synchronous and Asynchronous settings under "Performance". When these checkboxes are selected, the system will start generating log files containing tab-delimited workflow timing data. These files are named "swft.log" (for synchronous workflows) and "awft.log" (for asynchronous workflows). Among other various metrics, the entries in these log files contain the start and end time for each workflow executed on the system. We can roll this data up to get the total execution time and execution count for each workflow, and from there we can calculate the average execution time for each workflow in the system. This allows us to take a comparative look at the workflows in our system and identify where the system is spending its time. This, in turn, allows us to target our efforts toward improving those workflows that will have the most impact on system performance.
Here's a screenshot of the admin console showing where these settings can be found:
Once these are checked, the system will start capturing performance data. You can view the resulting data in the "awft.log" and "swft.log" files; these are tab-delimited and have the following columns of interest:
|4 (Column "D" if viewed in Excel)||Duration of workflow execution in milliseconds|
|6 (F)||Number of workflow steps executed|
|8 (H)||Start time (milliseconds since January 1, 1970)|
|10 (J)||End time (milliseconds since January 1, 1970)|
|12 (L)||Workflow Instance ID (identifies the specific workflow execution)|
|14 (N)||Workflow Template ID (identifies the workflow template and version)|
|17 (Q)||Parent Workflow Instance ID (identifies the workflow instance that called this workflow instance)|
|19 (S)||Execution type - "S" (Synchronous) or "A" (Asynchronous). Note that this can differ from the workflow type itself; a synchronous workflow can run asynchronously if called by an asynchronous workflow.|
|21 (U)||General execution information; includes module, BO, and Name of the workflow as well as other related data|
You can view these in Excel, and pull out the relevant data from there; but we'll take it a bit further and load the key information back into TRIRIGA so we can monitor our workflow performance without even leaving the application.
Building a Solution
We're going to add a couple of Business Objects in the "devUtil" module to contain our workflow performance data:
- devWorkflowPerformance - This Business Object will serve as a container for a set of metrics. The system will keep capturing workflow executions on an ongoing basis, as long as it is configured to do so; we're going to have start and end time fields so we can focus our analysis on a specific window of time if desired. When the user clicks "Create", we'll pull in all the data within the targeted window.
- devWorkflowMetrics - This object will hold the aggregated metrics for each workflow over the specified measurement window. For each individual workflow encountered, we'll total up the execution time and other key data points for all of the workflow's execution instances, create a devWorkflowMetrics record to contain the totals, then associate that record to the parent devWorkflowPerformance record.
For parsing the log data, we'll create a custom workflow task that will work as follows:
- Look at the "log4j.xml" log configuration to find where the workflow logging files are located.
- In the identified locations, look for files named "awft.log*" or "swft.log*". The log files roll over by date, so we want to parse through all of them to gather our data.
- For each file, read it line-by-line and process the tab-delimited data; keep a running tally of the metrics for each workflow encountered.
- Create IntegrationRecord objects for each workflow, and use the "saveRecord" BusinessConnect call to create the corresponding devWorkflowMetrics records in TRIRIGA.
- Use the "associateRecords" BusinessConnect function to associate the parent devWorkflowPerformance record to the newly created child records.
Attached at the bottom of this article is "logreader.zip", which contains the following:
- logreader-om.zip - Object Migration package containing the objects described above.
- workflow-logs.jar - Custom workflow task. This should be deployed on your application server; for JBoss, you just need to copy this file into the "server/all/lib" subdirectory under your JBoss installation and then restart. For other server types, consult your application server documentation.
Once you've got everything installed, open your Data Browser and find the "Workflow Performance" manager. Clicking "Add" brings up a window looking something like this:
Here you can specify the start and end times if desired (if left empty, all events will be gathered). Clicking the "Create" action will fire the workflow which processes the log files; once that finishes running, the "Performance Data" section should display all of the workflows and related data gathered for the specified window:
Finally, the "Charts" tab presents some of the breakouts (Average Execution Time, Total Execution Time, and Execution Count) as pie charts:
The source code for the solution presented in this article is available at the triDeveloper code repository: