# Slicing

The more data you have, the more time it takes to process. When the amount of data you want to replicate is massive, it is more efficient to separate the data into a few slices and process them separately. In Fivetran HVR, this is achieved with the **Slicing** functionality.

**Slicing** divides your source table into a few pieces while Refreshing or Comparing the data.

**Slicing** has the following benefits:

It speeds up the

**Refresh**or**Compare**job;It helps to avoid high memory or CPU load for long-running jobs.

## Slicing Types

HVR provides a few types of slicing, each of them best fitting a specific business case.

### Modulo Slicing

During **Modulo Slicing**, HVR performs a modulo operation using the values from the column you choose. This type of slicing is the easiest of the four **Slicing** types to set up.

Basically, the modulo operation returns the remainder of a division.

For example:

**15 mod 5 = 0**(15 divided by 5 equals 3, with the remainder of**0**);**15 mod 4 = 3**(15 divided by 4 equals 3, with the remainder of**3**).

When it comes to databases, the same operation is performed to slice the source table. The modulo value you choose will define two things: which rows will be in each slice, and the number of slices.

For example, you have a table with the column **customer_id** that has values from **0** to **1000**. If you set the modulo value to **10**, HVR will slice your table into 10 slices as follows:

**customer_id mod 10 = 0**(where customer_id = 0, 10, 20, ... 1000)**customer_id mod 10 = 1**(where customer_id = 1, 11, 21, ... 991)[...]

**customer_id mod 10 = 8**(where customer_id = 8, 18, 28, ... 998)**customer_id mod 10 = 9**(where customer_id = 9, 19, 29, ... 999)

For each customer_id, HVR performs a modulo operation and sorts the row into the respective slice depending on the remainder.

- Only a column with numeric values can be used for
Modulo Slicing.Modulo slicingis limited only to the databases that support modulo database function.- For
Modulo slicing, parametersSliceCountConditionandSliceSeriesConditionin theRestrictactionmust notbe defined.

### Count Slicing

**Count Slicing** is a bit more advanced version of **Modulo slicing**. With it, you can choose to perform **Slicing** on a column that contains values such as strings or dates.

To use **Count Slicing**, you need to define the **Restrict** action with the parameter **SliceCountCondition**=**{hvr_slice_num}** and **{hvr_slice_total}**.

For example, for the Oracle source, the defined **Restrict** might look as follows:

**Restrict** SliceCountCondition=mod(coalesce(ora_hash(mycol),0),{hvr_slice_total})={hvr_slice_num}

### Boundary Slicing

**Boundary Slicing** requires a column with the known range of possible values. It groups the data set depending on defined boundaries for the chosen column.

For this type of **Slicing**, you set the boundaries for each slice of your table. For example you have chosen the column **user_id** (the boundaries you set can be strings, numerics, or dates) and specify boundaries to **1000**, **3500**, and **6000**.

As the result, you'll have 4 slices:

with

**user_id**from the minimum**user_id**to**1000**with

**user_id**from**1001**to**3500**with

**user_id**from**3501**to**6000**with

**user_id**from**6001**to the maximum**user_id**in the table

For

Boundary slicing, parametersSliceCountConditionandSliceSeriesConditionin theRestrictactionmust notbe defined.

### Series Slicing

**Series Slicing** is an advanced option of **Boundary Slicing**. It groups the data set on the basis of a column value, where you define a slice condition that uses a particular value. It will work well if you have a few distinct groups of data, for example divided by country code.

**Series Slicing** requires a column with a small set of unique values.

For example, one of the columns in this table has country code values, and the rows are distributed as follows:

mycol | Number of Rows with a specific 'mycol' value |
---|---|

DE | 19081 |

LX | 10094 |

BE | 14159 |

NL | 13196 |

Here, if you apply **Slicing** based on the **mycol** column, the NL slice would have 13196 rows, and the LX slice would have 10094 rows.

To use

Series Slicing, you need to define actionRestrictwith parameterSliceSeriesCondition={hvr_slice_value}.

#### Example

Assuming

**SliceSeriesCondition="{hvr_slice_value}"**, the**Series Values**field should be filled in as follows:`mycol = 'DE' mycol = 'LX' mycol = 'BE' mycol = 'NL'`

Assuming

**SliceSeriesCondition="mycol =****{hvr_slice_value}"**, the**Series Values**field should be filled in as follows:`'DE' 'LX' 'BE' 'NL'`

## Slicing Limitations

For Bulk Refresh, slicing can only be done on the source table.

For Compare and for row-wise **Refresh**, slicing can be done on the source or the target table. But only the columns that are present in both source and target can be used for slicing.

### Limitations of Slicing for Refresh

#### Modulo Slicing

Following are the limitations when using slicing with modulo of numbers (*col%num*):

**Modulo Slicing**only works on numeric data types. It may work with binary**float**values depending on DBMS data type handling.- For some supported DBMSes (SQL Server, PostgreSQL, Greenplum, Redshift),
**Modulo****Slicing**on a**float**column is not allowed as it may result in SQL query error. - For some DBMSes,
**float**values above the limits of DBMS’s underlying precision ("big**float**values") may produce inaccurate results during**Modulo****Slicing**. This affects only heterogeneous environments. - Refresh with
**Modulo****Slicing**on a column with "big**float**values" may produce inaccurate results in HANA even in a homogeneous environment (HANA-to-HANA). - Heterogeneous
**Refresh**(between different DBMSes or file locations) has a limitation with**Modulo**slicing on the Oracle’s**NUMBER(*)**column: if a value has an exponent larger than 37 (e.g. if a number is larger than 1E+37 or smaller than -1E+37), then this row might be associated with a wrong slice. This column should not be used for**Modulo****Slicing.**The exact limits of the values depend on the number of slices.

A workaround for the above limitations is to use **Boundary** or **Count** **Slicing** with custom SQL expressions.

#### Boundary Slicing

**Boundary** **Slicing** does not work for dates in heterogeneous DBMSes.

### Limitations of Slicing for Direct File Compare

An Oracle’s **NUMBER(*)** column or a similar column with big numbers (namely, if a number is larger than 1E+37 or smaller than -1E+37) cannot be used in direct file compare with slicing. A workaround would be to use another column for slicing.

### Limitations of Slicing for SAP Unpack

Slicing is only possible on columns that exist in the source and only on columns that do not break document consistency. For cluster tables, slicing on **PAGENO** is not allowed, for long text (**STXL**) slicing on **SRTF2** is not allowed.

For SAP Unpack, slicing based on row-count from last **Refresh** or **Compare** might not work, since the information on the cluster table RFBLG may not be there.

When slicing tables that have **SapUnpack** parameter, you must select the **PackedInside** table during **Refresh**. Slicing will be done on the **PackedInside** table. For example, when refreshing **BSEG**, select both **BSEG** and **RFBLG** and slice on **RFBLG**.

In most SAP systems, slicing on **Client ID** (**MANDT**) does not result in efficient slicing.

Proposed slicing:

- Important cluster tables include
**BSEG**and**KONV**. - Slicing on
**RFBLG**(**BSEG**) can be done on columns**BUKRS**(company code),**BELNR**(document number) and**GJAHR**(fiscal year). - Slicing on
**KOCLU**(**KONV**) can be done on the column**KNUMV**(document number) only. - Slicing on
**STXL**is best done on the column**TDOBJECT**(text object information). - Pool tables are best sliced on column
**TABNAME**. This column contains the name of the unpack target table, values should be based on tables selected to unpack.