Performance Implications of the Resilient Integrate During Continuous Integration
Question
What are the performance implications of enabling /Resilient Integrate when doing continuous integration?
Environment
HVR 5
Answer
In HVR action Integrate has an option called /Resilient that can be used to merge changes into a target system:
- If an insert would result in a duplicate key value then instead the existing row is updated.
- If a row to be updated – based on the key – cannot be found then an insert is performed instead.
- If a row to be deleted cannot be found then the delete is ignored.
Integrate /Resilient is useful to synchronize systems that are not exactly in sync, for example, if the initial load was done through ETL or the starting point of the initial situation (also following a capture rewind) is not exactly known, or in situations where cascading constraints or triggers may lead to repeating operations.
Whether a row already exists, is verified using the replication key known to HVR rather than checking the actual indexes or constraints on the target table. Every table has a replication which is either:
- The table column(s) that is (are) marked to be the key. This metadata is generally discovered from the database that is used as the basis for the channel definition, based on the primary key and/or unique index metadata in the database, and can be modified or set based on knowledge of the data.
- Or all non-LOB columns in the table. Of course, there is a possibility that the combination of all column values for a row does not result in a unique identification for a row, and an action TableProperties /DuplicateRows must be used in order to correctly apply changes in such scenario.
The use of Integrate /Resilient can lead to performance issues on the target if all of the below conditions are true:
- The table has a lot of rows.
- The table does not have a primary key, so HVR has to use all non-LOB columns to identify the row.
- Integration is running in continuous mode (i.e. option /Burst is not checked on the Integrate action).
- Inserts are common.
The reason why these reasons can cause a performance issue during inserts, is that in order to do /Resilient processing for inserts HVR would generally rely on a unique index violation to perform the resilient apply, but it cannot do this when there is no primary key on the table. What HVR will do instead is the first query the table FOR EVERY INSERT to see if the row already exists (so that it can update the row if that is the case) and otherwise insert.
If there is no index, and the table has a lot of rows, then a table scan can take a long time, and it is performed FOR EVERY INSERT. If on top of this there are lots of optional columns – because HVR cannot simply generate NULL=NULL
in the where clause – then the query becomes more complex and some database optimizers can no longer generate an optimal query plan and often skip even non-unique indexes that otherwise could be beneficial to more quickly find a row.
If instead option /Burst is checked on the Integrate action then there is little concern with the performance impact of also using option /Resilient.