Tellius
Tellius 5.5
Tellius 5.5
  • 🚩Getting Started
    • 👋Say Hello to Tellius
      • Glossary
      • Tellius 101
      • Navigating around Tellius
    • ⚡Quick Start Guides
      • Search
      • Vizpads (Explore)
      • Insights (Discover)
    • ✅Best Practices
      • Search
      • Vizpads (Explore)
      • Insights (Discover)
      • Predict
      • Data
    • ⬇️Initial Setup
      • Tellius architecture
      • System requirements
      • Installation steps for Tellius
      • Customizing Tellius
    • Universal Search
    • 🏠Tellius Home Page
    • ❓FAQs
      • Kaiya Conversational AI
      • Data Preparation FAQs
      • Environment FAQs
      • Search FAQs
      • Vizpads FAQs
      • Data Caching FAQs
      • Embedding FAQs
      • Insights FAQs
  • Kaiya
    • ♟️Understanding AI Agents & Agentic Flows
      • Glossary
      • Composer
      • 🗝️Triggering an agentic workflow
      • The art of possible
    • 🤹Kaiya conversational AI
      • Triggering Insights with "Why" questions
      • Mastering Kaiya conversational AI
      • 📒Kaiya Learnings
      • Kaiya Terms of Service
  • 🔍Search
    • 👋Get familiar with our Search interface
    • 🤔Understanding Tellius Search
    • 📍Search Guide
    • 🚀Executing a search query
      • Selecting a Business View
      • Typing a search query
      • Constructing effective search queries
      • Marketshare queries
    • 🔑Analyzing search results
      • Understanding search results
      • Search Inspector
      • Time taken to execute a query
      • Interacting with the resulting chart
    • 📊Know your charts in Tellius
      • Understanding Tellius charts
      • Variations of a chart type
      • Building charts from Configuration pane
      • List of chart-specific fields
      • Adding columns to fields in Configuration pane
      • Absolute and percentage change aggregations
      • Requirements of charts
      • Switching to another chart
      • Formatting charts
      • Advanced Analytics
      • Cumulative line chart
    • 🧑‍🏫Help Tellius learn
    • 🕵️‍♂️Search history
    • 🎙️Voice-driven search
    • 🔴Live Query mode
  • 📈Vizpads (Explore)
    • 🙋Meet Vizpads!
    • 👋Get familiar with our Vizpads
    • #️⃣Measures, dimensions, date columns
    • ✨Creating Vizpads
    • 🌐Applying global filters
      • Filters in multi-BV Vizpads
      • Filters using common columns
    • 📌Applying local filters
    • 📅Date picker in filters
      • Customizing the calendar view
    • ✅Control filters
      • Multi-select list
      • Single-select list
      • Range slider
      • Dropdown list
    • 👁️Actions in View mode
      • Interacting with the charts
      • Exporting tables
    • 📝Actions in Edit mode
      • 🗨️Viz-level actions
      • Copy to Clipboard
    • 🔧Anomaly management for line charts
      • Instance level
      • Vizpad level
      • Chart level
    • ⏳Time taken to load a chart
      • Instance level
      • Vizpad level
      • Chart level
    • ♟️Working with sample datasets
    • 🔁Swapping Business View of charts
      • Swapping only the current Vizpad
      • Swapping multiple objects
      • Configuring the time of swap
    • 🤖Explainable AI charts
  • 💡Insights (Discover)
    • 👋Get familiar with our Insights
    • ❓Understanding the types of Insights
    • 🕵️‍♂️Discovery Insights
      • Impact Calculation for Top Contributors
    • ➕How to create new Insights
      • 🔛Creating Discovery Insight
      • 🔑Creating Key Driver Insights
      • 〰️Creating Trend Insights
      • 👯Creating Comparison Insights
    • 🧮The art of selecting columns for Insights
      • ➡️How to include/exclude columns?
  • 🔢Data
    • 👋Get familiar with our Data module
    • 🥂Connect
    • 🪹Create new datasource
      • Connecting to Oracle database
      • Connecting to MySQL database
      • Connecting to MS SQL database
      • Connecting to Postgres SQL database
      • Connecting to Teradata
      • Connecting to Redshift
        • Access S3 Data with Redshift Spectrum
      • Connecting to Hive
      • Connecting to Azure Blob Storage
      • Connecting to Spark SQL
      • Connecting to generic JDBC
      • Connecting to Salesforce
      • Connecting to Google cloud SQL
        • Connecting to a PostgreSQL cloud SQL instance
        • Connecting to an MSSQL cloud SQL instance
        • Connecting to a MySQL Cloud SQL Instance
      • Connecting to Amazon S3
      • Connecting to Google BigQuery
        • Steps to connect to a Google BigQuery database
      • Connecting to Snowflake
        • OAuth support for Snowflake
        • Integrating Snowflake with Azure AD via OAuth
        • Integrating Snowflake with Okta via OAuth
        • Azure PrivateLink
        • AWS PrivateLink
        • Best practices
      • Connecting to Databricks
      • Connecting to Databricks Delta Lake
      • Connecting to an AlloyDB Cluster
      • Connecting to HDFS
      • Connecting to Looker SQL Interface
      • Loading Excel sheets
      • 🚧Understanding partitioning your data
    • ⏳Time-to-Live (TTL) and Caching
    • 🌷Refreshing a datasource
    • 🪺Managing your datasets
      • Swapping datasources
    • 🐣Preparing your datasets
      • 🤾Actions that can be done on a dataset
      • Data Pipeline
      • SQL code snippets
      • ✍️Writeback window
      • 🧩Editing Prepare → Data
      • Handling null or mismatched values
      • Metadata view
      • List of icons and their actions
        • Functions
        • SQL Transform
        • Python Transform
        • Standard Aggregation
        • Creating Hierarchies
      • Dataset Scripting
      • Fusioning your datasets
      • Scheduling refresh for datasets
    • 🐥Preparing your Business Views
      • 🌟Create a new Business View
      • Creating calculated columns
      • Creating dynamic parameters
      • Scheduling refresh for Business Views
      • Setting up custom calendars
      • Custom Calendars for Live Connections
    • Tellius Engine: Comparison of In-Memory vs. Live Mode
    • User roles and permissions
    • Refresh pipeline
  • Feed
    • 📩What is a Feed in Tellius?
    • ❗Alerts on the detection of anomalies
    • 📥Actions done on a tracking Feed
    • 🖲️Track a new metric
  • Assistant
    • 💁Introducing Tellius Assistant
    • 🎤Voice-based Assistant
    • 💬Interacting with Assistant
    • ↖️Selecting Business View
  • Embedding Tellius
    • What you should know before embedding
    • Embedding URL
      • 📊Embedding Vizpads
        • Apply and delete filters
        • Vizpad-related actionTypes
        • Edit, save, and share a Vizpad
        • Keep, remove, drill sections
        • Adding a Viz to a Vizpad
        • Row-level policy filters
      • 💡Embedding Insights
        • Creating and Viewing Insights
      • 🔎Embedding Search
        • Search query execution
      • Embedding Assistant
      • 🪄Embedding Kaiya
      • Embedding Feed
  • API
    • Insights APIs
    • Search APIs
    • Authentication API (Login API)
  • ✨What's New
    • Release 5.5
    • Release 5.4
      • Patches 5.4.0.1 to 5.4.0.4
      • Patch 5.4.0.5
      • Patch 5.4.1
      • Patches 5.4.1.1 and 5.4.1.2
    • Release 5.3
      • Patch 5.3.1
      • Patch 5.3.2
      • Patch 5.3.3
    • Release 5.2
      • Patch 5.2.1
      • Patch 5.2.2
    • Release 5.1
      • Patch 5.1.1
      • Patch 5.1.2
      • Patch 5.1.3
    • Release 5.0
      • Patch 5.0.1
      • Patch 5.0.2
      • Patch 5.0.3
      • Patch 5.0.4
      • Patch 5.0.5
    • Release 4.3 (Fall 2023)
      • Patch 4.3.1
      • Patch 4.3.2
      • Patch 4.3.3
      • Patch 4.3.4
    • Release 4.2
      • Patch 4.2.1
      • Patch 4.2.2
      • Patch 4.2.3
      • Patch 4.2.4
      • Patch 4.2.5
      • Patch 4.2.6
      • Patch 4.2.7
    • Release 4.1
      • Patch 4.1.1
      • Patch 4.1.2
      • Patch 4.1.3
      • Patch 4.1.4
      • Patch 4.1.5
    • Release 4.0
Powered by GitBook

© 2025 Tellius

On this page
  • What is data partitioning?
  • How to partition your data?
  • How partitioning works: A practical example
  • Best practices and tips

Was this helpful?

  1. Data
  2. Create new datasource

Understanding partitioning your data

Get a thorough knowledge of how partitioning works in Tellius

What is data partitioning?

Data partitioning is the process of dividing a large dataset into multiple, more manageable subsets (partitions). Instead of treating millions or even billions of rows as a single block, partitioning breaks the data down into logical “chunks” based on a chosen numeric column and user-defined value ranges. This approach dramatically improves the speed and efficiency of data loading and querying, especially as your data scales.

By using partitioning, you reduce the time it takes to bring data into Tellius and enhance performance during analyses. Large datasets—such as several years of transactional data—can be spread across multiple partitions for parallel loading and faster overall processing.

How to partition your data?

  1. Partition column: A numeric column from your dataset used as the basis for partitioning. The values in this column help define how data is divided across partitions.

Choose a column with a relatively uniform distribution to achieve balanced partitions. such as timestamps or years. For example, a “year” column spanning a range of years is often a good candidate. A skewed distribution may cause certain partitions to be larger and slower, reducing performance benefits.

  1. Number of partitions: Indicates how many segments or “buckets” you want to split your data into. More partitions can improve loading speeds by allowing parallel operations, but too many partitions may become cumbersome. The general rule of thumb is to have partitions sized around 1 to 2 million rows each.

If you have approximately 16 million rows, consider 8 to 10 partitions. This helps ensure each partition has roughly 1-2 million rows, balancing load performance and manageability.

  1. Lower bound & Upper bound: Approximate minimum (lower) and maximum (upper) numeric values in the partition column. These bounds define the range over which the data will be split. Tellius uses these values to determine how the data is distributed across each partition.

Tellius can estimate these bounds automatically, but providing explicit lower and upper bounds can improve efficiency and accuracy—especially if you have prior knowledge of your data’s range.

How partitioning works: A practical example

Imagine you have a dataset containing records from the years 2010 to 2020, and you choose the “Year” column as your partition key:

  • Lower bound: 2010

  • Upper bound: 2020

  • Number of Partitions: 12

Tellius will create approximately 12 partitions spanning the range, distributing data as follows:

  • Partition 1: Values less than 2010

  • Partition 2: 2010 – 2011

  • Partition 3: 2011 – 2012

  • ... and so forth until ...

  • Partition 11: 2019 – 2020

  • Partition 12: Values greater than 2020

This balanced approach ensures that each partition handles a manageable slice of data, thus accelerating the load process. By refining the number of partitions and bounds, you can further optimize performance to suit your data scale and distribution.

Best practices and tips

  1. Start with estimates: If you’re unsure of exact bounds, let Tellius determine them automatically first. Once you see the distribution, you can refine the partition settings.

  2. Monitor performance: After initial loads, review load times and partition sizes. Adjust the number of partitions or bounds as necessary to improve speed.

  3. Keep it simple: Avoid overly granular partitions if you have a small dataset. A handful of partitions can suffice. For very large datasets, ramp up the number of partitions to meet your performance targets.

  4. Follow the 1-2 million rows per partition guideline: This heuristic helps maintain a balance between performance and overhead.

With partitioning, as your data grows, your performance scales right alongside it—ensuring that even with massive datasets, you can maintain responsive, high-performance analytics in Tellius.

PreviousLoading Excel sheetsNextTime-to-Live (TTL) and Caching

Last updated 5 months ago

Was this helpful?

🔢
🪹
🚧