What's New
Changelog - 5.1.3
Changelog - 5.1.2
Changelog - 5.1.1
Release 5.1
Changelog - 5.0.5
Changelog - 5.0.4
Changelog - 5.0.3
Changelog - 5.0.2
Changelog - 5.0.1
Release 5.0
Changelog - 4.3.4
Changelog - 4.3.3
Changelog - 4.3.2
Changelog - 4.3.1
Release 4.3 (Fall 2023)
Changelog - 4.2.7
Changelog - 4.2.6
Changelog - 4.2.5
Changelog - 4.2.4
Changelog - 4.2.3
Changelog - 4.2.1
Changelog - 4.2.2
Release 4.2
Changelog - 4.1.5
Changelog - 4.1.4
Changelog - 4.1.3
Changelog - 4.1.2
Changelog - 4.1.1
Release 4.1
Release 4.0
Release 3.9
Release 3.8
Release 3.7
Release 3.6
Release 3.5
Release 3.4
Release 3.3
Release 3.2
Release 3.1
Release 3.0
Release 2.4.1
Release 2.4
Free Cloud Trial
Release 1.8
Release 2.3
Release 2.2
Release 2.1
Release 2.0
Release 1.7
Release 1.6
Release 1.5
Release 5.2
Getting Started
Quick Guide
Best Practices Guide
Search - Best Practices
Vizpads (Explore) - Best Practices
Insights (Discover) - Best Practices
Predict - Best Practices
Data - Best Practices
Glossary
Tellius 101
Navigating around Tellius
System requirements
Tellius Architecture
Installation steps for Tellius
Guided tours for quick onboarding
Customizing Tellius
Search (Natural Language)
Search in Tellius
Guide Me
How to Search
Business View List / Columns
Query
Query
Percentage Queries
Time Period Queries
Live Query
Generating Insights-based queries from Search
Search Result
Discover Insights
Interactions
Chart Operations
Add to Vizpad
Table View
Switch Chart type
Change Chart Config
Apply Filters
Change Formatting
Measure Aggregation - Market Share Change
View Raw Data
Download/ Export
Embed URL
Partial Data for Visualization
Best-fit visual
Add to Vizpad
Adding the chart to a Vizpad
Customize the auto-picked columns
Search Query Inspector
Teach Tellius
History
Guided Search
Add Guided Search Experience
Display Names in the Search Guide
Guided Search
Guided Search Syntax and Attributes
Deep Dive
Maps in Search
Search Keywords
Percentage Queries
Time Period Queries
Year-over-Year Analysis
Additional Filters
Pagination
List View In Search Results
Marketshare queries
Embed Search
Personalized Search
Search Cheat Sheet
Filters in Help Tellius Learn
Explore (Vizpads)
Dashboards in Tellius
Vizpad Creation
Create Interactive Content
Create Visualization Charts
List of Charts
Common Chart Types
Line Chart
Bar Chart
Pie Chart
Year-over-Year Functionality in Vizpad
Area Chart
Combo Chart
KPI Target Chart
Treemaps
Bubble Chart
Histogram
Heat-Map Charts
Scatter Chart
Other Charts
Cumulative line chart
Cohort Chart
Explainable AI Charts
For each chart
Create Visualization Charts
Global Filters
Embedded Filters
Other Content
Anomaly management for charts
Creating Interactive Content
Vizpad level Interactions
Viz level Interactions
Discover Insights
Drivers
Discover hidden insights - Genius Insights
How Genius Insights works
Discoveries in Insight
Anomalies on Trend
Interactions
Chart Operations
Switch Chart type
Change Chart Config
Apply Filters
Change Formatting
Add X/Y Axis Target Lines to Scatter Chart
Improvements to Conditional Formatting
Adding Annotations to Tables
Displaying query execution time
Embedding Vizpad
Vizpad Consumption
Collection of Interactive Content
Vizpad level Interactions
Global Filter on the fly
Global Resolutions
Refresh
Notifications / Alerts
Share
Download / Export
Unique name for Vizpads
Edit Column Width
Viz level Interactions
Importing bulk filter values
Multi-Business View Vizpads
Discover (Genius Insights)
Discoveries
What are discoveries
Type of Discoveries in Tellius
Create Discoveries
Kick-off Key Drivers
Edit Insights
Key Driver Insights
Components of Key Drivers
What are Key Drivers
Edit Key Driver Insights
Segment Drivers
Trend Drivers
Trend Insights (Why Insights)
Components of Trend Insights
WHAT: Top Contributors
WHY: Top Reasons
HOW: Top Recommendations
Seamlessly navigating to "Why" from "What"
Create Trend Insight
Edit Trend Insights
What are Trend Insights
Comparison Insights
Components of Comparison Insights
Create Comparison Insight
What are Comparison Insights
Edit Comparison Insights
Others Actions
Save
Refresh
Share Insights
Download
Adding Insights to Vizpad
Insights Enhancements
Embedding Insight
Impact Calculation for Top Contributors
Marketshare
Live Insights
Predict (Machine Learning)
Machine Learning
AutoML
How to create AutoML models
Leaderboard
Prediction
Others
What is AutoML
Point-n-Click Predict
Feed (Track Metrics)
Assistant (Conversations)
Tellius on Mobile devices
Data (Connect, Transform, Model)
Connectors
Connector Setup
Google BigQuery
Google Cloud SQL
Connecting to a PostgreSQL Cloud SQL Instance
Connecting to an MSSQL Cloud SQL Instance
Connecting to a MySQL Cloud SQL Instance
Snowflake
PrivateLink
Snowflake Best Practices
OAuth support for Snowflake
Integrating Snowflake with Azure AD via OAuth
Integrating Snowflake with Okta via OAuth
Edit Connector
Live Connect
Data Import
Cache
Direct Business View
JDBC connector for PrestoDB
Amazon S3
Time-to-Live (TTL) and Caching
Loading Excel sheets
Looker SQL Interface
Databricks
Connecting to an AlloyDB Cluster
List of Connectors by Type
Tables Connections
Custom SQL
Schedule Connector Refresh
Share Connections
Datasets
Load Datasets
Configure Datasets (Measure/Dimensions)
Transform Datasets
Create Business View
Share Datasets
Copy Datasets
Delete Datasets
Swapping datasources
Metadata migration
Data Prep
Datasets
Data Profiling / Statistics
Transformations
Dataset Transform
Aggregate Transforms
Calculated Columns
SQL Transform
Python Transform
Create Hierarchies
Filter Data
SQL Code Snippets
Multiple Datasets Scripting SQL
Column Transforms
Column Metadata
Column type
Feature type
Aggregation
Data type
Special Types
Synonym
Rename Column
Filter Column
Delete Column
Variable Display Names
Other Functions
Metadata View
Dataset Information
Dataset Preview
Alter Pipeline Stage
Edit / Publish Datasets
Data Pipeline (Visual)
Alerts
Partitioning for JDBC Datasets
Export Dataset
Write-back capabilities
Data Fusion
Schedule Refresh
Business Views
Create Business View
Create Business View
Datasets Preview & List
Add datasets to Model
Joins
Column selection
Column configuration
Primary Date
Geo-tagging state/country/city
Save to Fast Query Engine
Publish
Business View
What is Data Model
BV Visual Representation (Preview)
BV Data Sample
Learnings (from Teach Me)
Custom Calculations (Report-level Calc)
Predictions on BV
BV Refresh
Export/ Download Business View
Share Business View
URL in Business View
Request Edit Access
Tellius Engine: Comparison of In-Memory vs. Live Mode
Projects (Organize Content)
Monitor Tellius
Embedding Tellius
Embedding
Settings
About Tellius
User Profile
Admin Settings
Manage Users
Team (Users)
Details & Role
Create a new user
Edit user details
Assigning the user data to another user
Restricting the dataset for a user
Deleting a user
Assign User Objects
User roles and permissions
Teammates (Groups)
Authentication & Authorization
Authentication
Authorization (Roles)
API Access (OAuth Access)
Audit Logs
Application & Advanced Settings
Data
Machine Learning
Genius Insights
Usage tracking & Support
CDN
Download Business View, Dataset, and Insights for Live BV
Customize Help
Impersonate
Data Size Estimation and Calculation
Miscellaneous Application Settings
Configuration for time/date-related results
Dataflow Access
Enable In-memory operations on Live sources
Language Support
Administration
Setup & Configuration
Installation Guide
AWS Marketplace
Autoscaling
Backup and Restore
Help & Support
FAQ
Data Preparation FAQs
Environment FAQs
Search FAQs
Vizpads FAQs
Data Caching
Security FAQs
Embedding FAQs
Insights FAQs
Tellius Product Roadmap
Help and Support System
Guided Tours
Product Videos
Articles & Docs
Provide Feedback
Connect with Tellius team
Support Process
Notifications
Tellius Kaiya
Say hello to Tellius Kaiya 👋
Automating the generation and validation of SQL/Python code
Kaiya Learnings
Automating the generation of metadata
Kaiya mode in Search
Chart and tab summaries
Getting Started Videos
Getting Started
Tellius Connect
Tellius Data Overview Video
Connecting to Flat Files Video
Connecting to Data Sources Video
Live Connections Video
Data Refresh and Scheduling Video
Tellius Prep
Getting Started with Tellius Prep Video
Transformations, Indicators, Signatures, Aggregations and Filters Video
SQL and Python Video
Working with Dates Video
Data Fusion Video
Business View Video
Business Mapping Video
Report Level Calculations Video
Writeback to DB
Natural Language Search
Getting Started with Search Video
How-To Search Video
Customizing Search Results Video
Search Interactions Video
Help Tellius Learn
Explore - Vizpads
Getting Started with Vizpads Video
Creating Vizpads Video
Creating and Configuring Visualizations Video
Viz-Level Interactions Video
Vizpad-Level Interactions Video
Auto Insights
Getting Started with Auto Insights Video
Discovery Insights Video
Segment Insights Video
Trend Insights Video
Comparison Insights Video
Iterate on Insights Video
Tellius Feed Video
Predict - ML Modeling
Getting Started with Predict Video
AutoML Configuration Video
AutoML Leaderboard Video
Point-n-Click Regression Video
Point-n-Click Classification Video
Point-n-Click Clustering Video
Point-n-Click Time Series Video
Point-n-Click PythonML Video
PredictAPI Video
Apply ML Model Video
ML Refresh and Schedule Video
Admin
Best Practices & FAQs
API Documentation
Vizpad APIs
User & user groups APIs
Machine Learning APIs
Fall 2023 (4.3)
- All Categories
- Settings
- Admin Settings
- Application & Advanced Settings
- Data Size Estimation and Calculation
Data Size Estimation and Calculation
The purpose of this document is to describe the data size calculation algorithm used in Tellius to calculate the size of a dataset as per the Tellius specifications.
Data Size Calculation Algorithm
Tellius calculates the size of the dataset closer to the CSV format using a specific number of bytes for each data type. Below are the number of bytes for each data type
- Int: 2 bytes for the integer columns. The range of values for this is -2147483648 to 2147483648
- Bigint: 4 bytes for the Big int or Long columns. The range of values for this is -9223372036854775808 to 9223372036854775808
- Float: 4 bytes for the Float columns. The range of values for this is 1.1754 E-38 to 3.4028 E+38
- Double: 8 bytes for the Double columns. The range of values for this is 2.2250 E-308 to 1.7976 E +308
- Boolean: 1 byte for the Boolean columns. Values are true/false
- String: 15 bytes for the String columns.
The correct way for String calculation is based on the max length of the string in the column. So columns like Country code will be of size 2 bytes and columns having user comments will have 20 to 30 bytes. But 15 is an average to start with.
Based on the above sizes, we calculate the size of each row and then multiply by the number of rows to calculate the total size of the dataset.
Data Size Threshold
Based on the data capacity of the instance and the total datasets loaded, Tellius checks if the new dataset being loaded can be accommodated within the available capacity.
Getting the size of the datasets for this step differs based on the data sources dataset is being loaded from.
- CSV / XLSX/ URL / S3 / HDFS/ AzureBlob / JSON / XML / Unstructured Text / FTP
Datasets loaded from these file-based sources, will not have a way to get the total number of rows of the dataset before loading the data. So we take the size of the files or folders in this scenario and compare across the data capacity to check if that can be loaded.
This doesn’t give 100% confidence as the data size of the parquet can be 10x smaller compared to CSV, so a 1GB parquet will be allowed to load when the capacity has 1GB available capacity. But the dataset size after loading can be 10GB. So loading more datasets after this load will be blocked.
- Oracle / MemSQL / MySQL / Postgresql / Redshift / MS SQL / Teradata / Snowflake / JDBC / Exasol
Datasets loaded from these databases, which support JDBC formats, will follow size estimation based on the data types and the total number of rows. Tellius pulls sample data to get the schema of the dataset, queries the database to get the total number of rows in the table, and estimates the size of the table. This size is used to check if it's within the instance limits.
- MongoDB / Cassandra / ES / Salesforce / Google Analytics / Impala / Hive / Big Query
Datasets loaded from these sources do not have any data estimation so there will be no check on these when the datasets are loaded. They will be allowed to load without any check but after the dataset is loaded the size is calculated and would be added to the total used capacity so loading more datasets after this will be blocked.