Disclaimer

The "triDeveloper" site and content are neither affiliated with nor sponsored by TRIRIGA, Inc. "TRIRIGA" is a trademark of TRIRIGA, Inc. All trademarks used on this site are the property of their respective owners.

Migrating Data With DataConnect - Part 1

Getting Started With The DataConnect Framework

 

Whether as a one-time data movement from a legacy application, or an ongoing integration feed from other systems, chances are you've had to move large amounts of data into TRIRIGA at some point.  Previous versions of the platform provided a couple of options for managing this:

  • Data Integrator - Tab-delimited flat files can be used to load data into the application, either manually or through a scheduled load.  Straightforward data field loads are supported, as well as simple associations between objects.  More complicated load scenarios can be very cumbersome.
  • BusinessConnect - External applications can be developed to load data using the SOAP-based BusinessConnect API.  The web services are generally flexible enough to handle most situations, but often require significant development effort.
Recent versions of the platform have introduced a third approach for loading data to the system.  DataConnect provides a means of automatically loading data into TRIRIGA from a set of staging tables maintained within the platform itself.  Some key benefits of DataConnect:
  • The DataConnect tools are highly integrated into the platform.  Jobs are created as Business Objects, and staging tables are created by making simple "tweaks" to the target objects for which data is being loaded.  A new DataConnect workflow task is introduced to seamlessly process records as they are loaded into the system, and tailored business logic can be applied by using the existing workflow tools.  Finally, monitoring and job control functions can be accessed through the standard administration console.
  • Integrations leverage widely available development skill sets.  Fairly basic SQL skills are all that is required to move data into the staging tables; developers do not need experience with web services or TRIRIGA-specific technologies.  Any toolset compatible with the database can be used to load the data.
  • Simple support for loading parent-child relationships and other related records.  The staging tables contain a correlation identifier which can be used to relate child records to parents, as well as a sequence identifier for managing the order in which records are loaded.  This allows entire sets of records to be moved into the system in a tightly orchestrated and ordered fashion.
Of course, there are scenarios where other approaches are more appropriate.  Simple data loads (especially "ad hoc" updates or inserts) may be better suited for Data Integrator.  Data Integrator can also more handle creating arbitrary associations between existing records more easily; DataConnect is geared more toward controlled migration of data sets.  BusinessConnect will probably be a better choice for real-time interfaces and more complex interactions with third-party systems, especially where you need to move data both to and from TRIRIGA, as DataConnect is currently an inbound-only solution.  But for large data load applications, either "one-shot" migrations or ongoing integration scenarios, DataConnect can be an excellent fit.

Getting Started

If you have a fairly recent version of the platform and application, you may be ready to use DataConnect; open the Data Modeler and look for the "triDataConnectJob" Module and Business Object.  If you see them there, you should be ready to go.  If not, make sure you have upgraded your platform to 2.6.2 or higher and look in the "userfiles/upgrades" directory under your TRIRIGA installation.  You should see a file named "triDataConnectJob.zip" in that directory; perform an Object Migration import of that package to get DataConnect up and running.
 
The first step is identifying the data points that you need to load into the system.  If you happen to have a concise set of data requirements specifying exactly which fields on which Business Objects need to be updated, that's fantastic; if not, you'll want to sit down and define those with your customer.
 
Once you have a list of Business Objects and corresponding fields in the system, do the following for each Business Object:
  1. Open the Data Modeler and revise the Business Object in question.
  2. In the Business Object properties, check the "Has Staging Table" box.  Click "Save BO".
  3. Open the Field List for the object, and select each identified field; check the "Staging Table Field" checkbox for each field, and click "Save Field".
  4. Publish the Business Object.
This will create a staging table in the TRIRIGA database schema with the specified staging table fields.  You can review the staging table(s) from the TRIRIGA administration console; select "DataConnect" in the "Managed Objects" list on the left hand side, then select "Business Objects with Staging Tables".  If you select the checkbox next to the Business Object name and click "Expand Data", you can review the list of all fields mapped in the staging table.
 
Once you have defined the staging tables that will hold your data, you will need to create a Business Object to represent the job.  This should be created as a new Business Object within the "triDataConnectJob" module.  Almost everything you need is included in the base Business Object; the only thing you need to do is specify the published name for your new BO (it should be the "Job Number" field) and publish the Business Object.

The next step is to create a workflow to process the incoming records.  This will be an asynchronous workflow on your DataConnect Job Business Object, triggered on the "SYSTEM DC PROCESS JOB" event.  The DataConnect Task is used to iterate over the incoming records and perform any required logic; nested DataConnect tasks can be used to load related child records.

Example

Attached at the bottom of this article is "dataconnect-example.zip", an Object Migration package containing the following:
  • A simple Business Object, "devExample".  This serves as a parent object, with a section containing child records.
  • Another small Business Object, "devChildExample", which serves as a child record to devExample.
  • A DataConnect Job Business Object, "devExampleJob".
  • A hierarchical query, "devExample - Example Records with Children", which displays the parent records and their associated children.
  • A workflow, "devExampleJob - DC Process - Process Incoming Records", which is used to process the incoming DataConnect records.
The job workflow loads in parent "devExample" records, along with related child "devChildExample" records.  It works as follows:
  1. The Start task is set to asynchronous, with an action of "SYSTEM DC PROCESS JOB".  The Integration flag is set, which marks this as a DataConnect workflow (and makes the DataConnect task available):


  2. The "Process Example Records" step is a DataConnect task.  The Object is set to "devExample", which tells the task to pull records from the S_DEVEXAMPLE staging table.  Here, the Initial State is set to "null"; this means that the devExample records will be created in a null state, and we will have to trigger an action to move them into a desired end state (otherwise they will be deleted):


    Correlation is set to "In-Sequence".  This basically means that the incoming records will be processed in order of ascending Correlation ID followed by ascending Sequence Number; the Correlation ID does not refer to any parent record.  The Transaction property is set to "None"; this means that the incoming record will be committed to the database before processing the body of the DataConnect task.  You can use the Transaction property to rollback incoming records if you detect an issue with the data; selecting "Per X Iterations" will commit the data after processing X rows, and "All Iterations" will only commit the data after all records have been processed (breaking out of the body effectively rolls back the transaction).

  3. Within the body, the first step is "Process Child Records", another DataConnect Task:


    The Object for this step is set to "devChildExample", which will pull records from the S_DEVCHILDEXAMPLE staging table in the database.  Here, we have selected "triActive" as the Initial State.  This means that the incoming child records will be immediately created in the triActive state (without firing any of the usual workflows that would trigger on a transition to that state).  This is especially useful if you are migrating legacy data; you can simply move over the records into the desired state without triggering lots of unnecessary business logic.  In this step, the Correlation property is set to "Task Step" and specifies the parent DataConnect task as the step.  This causes the system to interpret the child record's Correlation ID as a reference to the Correlation ID of a record in the parent table; so instead of iterating over all the devChildExample records in order, the system only processes those that are "tagged" with the Correlation ID of the devExample parent record that is currently being processed.

  4. Within the child DataConnect task body is a simple Modify Records task, "Map Child to Parent Data Section".  This just maps the source record (the devChildExample record from the current iteration) to the "devChildRecords" data section on the parent devExample record.


  5. Back in the outer DataConnect task body, we then trigger the "triCreate" action on the parent devExample record (the "Trigger Create" step).  This moves the record out of the null state so that it will not be deleted.  Note that this was not necessary for the child records, since they were automatically created in the triActive state.

  6. Finally, after the DataConnect processing is complete we trigger "triDCCompleted" on the Job object itself.  This marks the job as completed successfully.  If we had encountered issues, we could have also triggered "triDCFailed"; this would have marked the job as failed.  The final state of the job can be seen in the administration console.

So in a nutshell, the workflow pulls in the parent records, and for each one pulls in the related child records and adds them to the parent's data section.

Also attached at the bottom of this article is a small SQL script ("dataconnect-example.sql") that will load some sample data into the staging tables.  This script does the following:
  1. Creates an entry in DC_JOB.
  2. Inserts some parent devExample records into S_DEVEXAMPLE.
  3. Inserts some child devChildExample records into S_DEVCHILDEXAMPLE, related to the parent records.
  4. Updates the DC_JOB entry and marks it as ready for processing.
Running this script will create a job entry with job number "12345" (it is hardcoded, so if you want to run the script a second time you'll need to search and replace that with a new job number).  After the DataConnect agent processes the job, the records should be visible within TRIRIGA; you should see two devExample records, each with a few devChildExample children attached.

In the followup article we'll look at using ETL tools to migrate data into the staging tables, specifically focusing on the Pentaho Data Integration (Kettle) toolkit.
 

Files