Partitioning for JDBC Datasets

Data Partitioning

In order to improve the performance of the data load, Tellius introduces partitioning of the data. This will make the data load faster into the system, especially when the data sizes increase. User needs to provide the Partition Column and Number of Partitions followed by Lower and Upper bounds. Upper bound and Lower bounds are the Minimum and Maximum value of the column.

There need not be single unique key in the dataset. Partition key can be any numeric key which has uniform distribution. For instance if the data has data for 10 years (2010 - 2020) and there is a year column. It can be used for partitioning.

Min - 2010

Max - 2020

Number of partitions - 12

This will actually load the data into 12 different partitions with the following logic.

  1. Partition 1 < 2010
  2. Partition 2 : 2010 - 2011
  3. Partition 3 : 2011 - 2012
  4. Partition 4 : 2012 - 2013
  5. Partition 5 : 2013 - 2014
  6. Partition 6 : 2014 - 2015
  7. Partition 7 : 2015 - 2016
  8. Partition 8 : 2016 - 2017
  9. Partition 9 : 2017 - 2018
  10. Partition 10 : 2018 - 2019
  11. Partition 11 : 2019 - 2020
  12. Partition 12 : >2020

Thumb rule for the number of partitions is to have 1 to 2 million rows per partitions. If the data has 16M rows, then it can have 8 or 10 partitions.

How to set partitions for JDBC data load?

1. Login to Tellius and open the Data from the side menu bar.

2. Go to CONNECT and click on CREATE NEW button, it will open the available data connectors for data load.

3. From the list of connectors, select the JDBC connector.

4. Load the data and click on ADVANCE SETTINGS to load additional options.

5. Set partitions values Like Partition column, Number of Partitions, Lower bound, Upper bound in Data Partitioning for JDBC Data load and click on LOAD.

Data Partitioning - SQL Data, JDBC Data

This functionality helps to improve the performance of data loading by partitioning the data. It helps to load the data faster into the system, especially when the data size increases. You need to provide the Partition Column and Number of Partitions followed by Lower and Upper bounds. Upper bound and Lower bounds are the Minimum and Maximum values of the column.

There need not be a single unique key in the dataset. A partition key can be any numeric key that has a uniform distribution. For instance, if the data has data for 10 years (2010 - 2020) and there is a year column. It can be used for partitioning.

Min – 2010

Max - 2020

Number of partitions - 12

This will load the data into 12 different partitions with the following logic.

1.     Partition 1 < 2010

2.     Partition 2: 2010 – 2011 and so on.

The Thumb rule for the number of partitions is to have 1 to 2 million rows per partition. If the data has 16M rows, then it can have 8 or 10 partitions.

To set the Partitions:

1. Click Data and click CONNECT.

2. Click CREATE NEW button, it will open the available data connectors for data load.

3. From the list of connectors, select the MYSQL connector.

4. Load the data and click ADVANCE SETTINGS to load additional options.

5. Set partitions values Like Partition column, Number of Partitions, Lower bound, Upper bound in Data Partitioning for MYSQL Data load and click LOAD.

Did we help you?

Alerts

Export Dataset

Contact