Partitioning for JDBC Datasets

Deepak Chaudhary Updated by Deepak Chaudhary

Data Partitioning

In order to improve the performance of the data load, Tellius introduces partitioning of the data. This will make the data load faster into the system, especially when the data sizes increase. User needs to provide the Partition Column and Number of Partitions followed by Lower and Upper bounds. Upper bound and Lower bounds are the Minimum and Maximum value of the column.

There need not be single unique key in the dataset. Partition key can be any numeric key which has uniform distribution. For instance if the data has data for 10 years (2010 - 2020) and there is a year column. It can be used for partitioning.

Min - 2010

Max - 2020

Number of partitions - 12

This will actually load the data into 12 different partitions with the following logic.

  1. Partition 1 < 2010
  2. Partition 2 : 2010 - 2011
  3. Partition 3 : 2011 - 2012
  4. Partition 4 : 2012 - 2013
  5. Partition 5 : 2013 - 2014
  6. Partition 6 : 2014 - 2015
  7. Partition 7 : 2015 - 2016
  8. Partition 8 : 2016 - 2017
  9. Partition 9 : 2017 - 2018
  10. Partition 10 : 2018 - 2019
  11. Partition 11 : 2019 - 2020
  12. Partition 12 : >2020

Thumb rule for the number of partitions is to have 1 to 2 million rows per partitions. If the data has 16M rows, then it can have 8 or 10 partitions.

How to set partitions for JDBC data load?

1. Login to Tellius and open the Data from the side menu bar.

2. Go to CONNECT and click on CREATE NEW button, it will open the available data connectors for data load.

3. From the list of connectors, select the JDBC connector.

4. Load the data and click on ADVANCE SETTINGS to load additional options.

5. Set partitions values Like Partition column, Number of Partitions, Lower bound, Upper bound in Data Partitioning for JDBC Data load and click on LOAD.

How did we do?

Alerts

Export Dataset

Contact