How to Create Multiple Integrate Jobs in the Same Channel for Parallelism
Question
How do I create multiple integrate jobs in the same channel for parallelism using the /Absent parameter in action TableProperties?
Environment
HVR 5
Answer
The following example explains how you can have parallel integrate jobs in the same channel:
Let's assume a channel captures from an Oracle database and integrates into Amazon S3 buckets, Oracle, and SQL Server. The channel consists of one capture job for all tables in an Oracle schema and three parallel integrate jobs sending the data to Oracle, SQL Server, and S3 bucket.
There is one integrate job, job multiintegra-integ-olx, integrates all tables to the Oracle target. The other two jobs, multiintegra-integ-snw2 and multiintegrate-integ-s3, integrate all the tables except the ones marked as Absent in the channel.
Additionally, to integrate into the Amazon S3 buckets where no small files should be created because of performance impact, the integrate jobs which have multiple tables have the action Integrate /OrderByTable. This ensures the data is sorted in such way that only a single target file for each source table is created in an integrate cycle. If the data is not sorted, a lot of small target files are created for each source table. To minimize the number of integrate cycles, CycleByteLimit is set to 0. This means all transaction files created by capture in one cycle are processed, instead of the default 10 MB chunks.
The use of TableProperties /Absent is supported in HVR version 5.0.4 and higher.
The channel for the above setup appears as follows: