Tellius
  • 🚩Getting Started
    • 👋Say Hello to Tellius
      • Glossary
      • Tellius 101
      • Navigating around Tellius
      • Guided tours for quick onboarding
    • ⚡Quick Start Guides
      • Search
      • Vizpads (Explore)
      • Insights (Discover)
    • ✅Best Practices
      • Search
      • Vizpads (Explore)
      • Insights (Discover)
      • Predict
      • Data
    • ⬇️Initial Setup
      • Tellius architecture
      • System requirements
      • Installation steps for Tellius
      • Customizing Tellius
    • Universal Search
    • 🏠Tellius Home Page
  • Kaiya
    • ♟️Understanding AI Agents & Agentic Flows
      • Glossary
      • Composer
      • 🗝️Triggering an agentic workflow
      • The art of possible
      • Setting up LLM for Kaiya
    • 🤹Kaiya conversational AI
      • ❓FAQs on Kaiya Conversations
      • Triggering Insights with "Why" questions
      • Mastering Kaiya conversational AI
  • 🔍Search
    • 👋Get familiar with our Search interface
    • 🤔Understanding Tellius Search
    • 📍Search Guide
    • 🚀Executing a search query
      • Selecting a Business View
      • Typing a search query
      • Constructing effective search queries
      • Marketshare queries
    • 🔑Analyzing search results
      • Understanding search results
      • Search Inspector
      • Time taken to execute a query
      • Interacting with the resulting chart
    • 📊Know your charts in Tellius
      • Understanding Tellius charts
      • Variations of a chart type
      • Building charts from Configuration pane
      • List of chart-specific fields
      • Adding columns to fields in Configuration pane
      • Absolute and percentage change aggregations
      • Requirements of charts
      • Switching to another chart
      • Formatting charts
      • Advanced Analytics
      • Cumulative line chart
    • 🧑‍🏫Help Tellius learn
    • 🕵️‍♂️Search history
    • 🎙️Voice-driven search
    • 🔴Live Query mode
  • 📈Vizpads (Explore)
    • 🙋Meet Vizpads!
    • 👋Get familiar with our Vizpads
    • #️⃣Measures, dimensions, date columns
    • ✨Creating Vizpads
    • 🌐Applying global filters
      • Filters in multi-BV Vizpads
      • Filters using common columns
    • 📌Applying local filters
    • 📅Date picker in filters
      • Customizing the calendar view
    • ✅Control filters
      • Multi-select list
      • Single-select list
      • Range slider
      • Dropdown list
    • 👁️Actions in View mode
      • Interacting with the charts
    • 📝Actions in Edit mode
      • 🗨️Viz-level actions
    • 🔧Anomaly management for line charts
      • Instance level
      • Vizpad level
      • Chart level
    • ⏳Time taken to load a chart
      • Instance level
      • Vizpad level
      • Chart level
    • ♟️Working with sample datasets
    • 🔁Swapping Business View of charts
      • Swapping only the current Vizpad
      • Swapping multiple objects
      • Configuring the time of swap
    • 🤖Explainable AI charts
  • 💡Insights (Discover)
    • 👋Get familiar with our Insights
    • ❓Understanding the types of Insights
    • 🕵️‍♂️Discovery Insights
    • ➕How to create new Insights
      • 🔛Creating Discovery Insight
      • 🔑Creating Key Driver Insights
      • 〰️Creating Trend Insights
      • 👯Creating Comparison Insights
    • 🧮The art of selecting columns for Insights
      • ➡️How to include/exclude columns?
  • 🔢Data
    • 👋Get familiar with our Data module
    • 🥂Connect
    • 🪹Create new datasource
      • Connecting to Oracle database
      • Connecting to MySQL database
      • Connecting to MS SQL database
      • Connecting to Postgres SQL database
      • Connecting to Teradata
      • Connecting to Redshift
      • Connecting to Hive
      • Connecting to Azure Blob Storage
      • Connecting to Spark SQL
      • Connecting to generic JDBC
      • Connecting to Salesforce
      • Connecting to Google cloud SQL
        • Connecting to a PostgreSQL cloud SQL instance
        • Connecting to an MSSQL cloud SQL instance
        • Connecting to a MySQL Cloud SQL Instance
      • Connecting to Amazon S3
      • Connecting to Google BigQuery
        • Steps to connect to a Google BigQuery database
      • Connecting to Snowflake
        • OAuth support for Snowflake
        • Integrating Snowflake with Azure AD via OAuth
        • Integrating Snowflake with Okta via OAuth
        • Azure PrivateLink
        • AWS PrivateLink
        • Best practices
      • Connecting to Databricks
      • Connecting to Databricks Delta Lake
      • Connecting to an AlloyDB Cluster
      • Connecting to HDFS
      • Connecting to Looker SQL Interface
      • Loading Excel sheets
      • 🚧Understanding partitioning your data
    • ⏳Time-to-Live (TTL) and Caching
    • 🌷Refreshing a datasource
    • 🪺Managing your datasets
      • Swapping datasources
    • 🐣Preparing your datasets
      • 🤾Actions that can be done on a dataset
      • Data Pipeline
      • SQL code snippets
      • ✍️Writeback window
      • 🧩Editing Prepare → Data
      • Handling null or mismatched values
      • Metadata view
      • List of icons and their actions
        • Functions
        • SQL Transform
        • Python Transform
        • Standard Aggregation
        • Creating Hierarchies
      • Dataset Scripting
      • Fusioning your datasets
      • Scheduling refresh for datasets
    • 🐥Preparing your Business Views
      • 🌟Create a new Business View
      • Creating calculated columns
      • Creating dynamic parameters
      • Scheduling refresh for Business Views
      • Setting up custom calendars
    • Tellius Engine: Comparison of In-Memory vs. Live Mode
  • Feed
    • 📩What is a Feed in Tellius?
    • ❗Alerts on the detection of anomalies
    • 📥Viewing and deleting metrics
    • 🖲️Track a new metric
  • Assistant
    • 💁Introducing Tellius Assistant
    • 🎤Voice-based Assistant
    • 💬Interacting with Assistant
    • ↖️Selecting Business View
  • Embedding Tellius
    • What you should know before embedding
    • Embedding URL
      • 📊Embedding Vizpads
        • Apply and delete filters
        • Vizpad-related actionTypes
        • Edit, save, and share a Vizpad
        • Keep, remove, drill sections
        • Adding a Viz to a Vizpad
        • Row-level policy filters
      • 💡Embedding Insights
        • Creating and Viewing Insights
      • 🔎Embedding Search
        • Search query execution
      • Embedding Assistant
      • 🪄Embedding Kaiya
      • Embedding Feed
  • API
    • Insights APIs
    • Search APIs
    • Authentication API (Login API)
  • ✨What's New
    • Release 5.4
      • Patch 5.4.0.x
    • Release 5.3
      • Patch 5.3.1
      • Patch 5.3.2
      • Patch 5.3.3
    • Release 5.2
      • Patch 5.2.1
      • Patch 5.2.2
    • Release 5.1
      • Patch 5.1.1
      • Patch 5.1.2
      • Patch 5.1.3
    • Release 5.0
      • Patch 5.0.1
      • Patch 5.0.2
      • Patch 5.0.3
      • Patch 5.0.4
      • Patch 5.0.5
    • Release 4.3 (Fall 2023)
      • Patch 4.3.1
      • Patch 4.3.2
      • Patch 4.3.3
      • Patch 4.3.4
    • Release 4.2
      • Patch 4.2.1
      • Patch 4.2.2
      • Patch 4.2.3
      • Patch 4.2.4
      • Patch 4.2.5
      • Patch 4.2.6
      • Patch 4.2.7
    • Release 4.1
      • Patch 4.1.1
      • Patch 4.1.2
      • Patch 4.1.3
      • Patch 4.1.4
      • Patch 4.1.5
    • Release 4.0
Powered by GitBook

© 2025 Tellius

On this page

Was this helpful?

Export as PDF
  1. Data

Preparing your datasets

Refine, transform, and organize your datasets

PreviousSwapping datasourcesNextActions that can be done on a dataset

Last updated 4 months ago

Was this helpful?

Once you’ve created or imported a dataset (through Connect), you can refine, transform, and organize it under the Prepare module. The left-hand panel lists available datasets and folders, while the central workspace offers specialized tabs: Data, Metadata, Scripting, Data Fusion, and Schedule. Each tab serves a unique purpose in preparing your data for analytics.

  1. Datasets: A list of datasets created. This left pane allows you to quickly search and sort datasets or folders by name.

    • Datasets can be organized into folders.

    • The icon of each dataset indicates the datasource type (e.g., Snowflake vs. CSV). Live datasets are indicated with a green dot.

    • The archived folder at the end contains older or deprecated datasets. You can still access them but they’re separated for clarity.

    • Click a folder to expand or collapse its contents.

    • Select a dataset to open it in the central Prepare workspace.

  2. Displays the following options:

    • Create a new dataset: On clicking this button, you will be redirected to Data → Connect. For more details, please check out section.

    • Import dataset: Click on this button to import the required dataset. (Only .zip files are allowed to import)

    • Create a new folder: Creates a new folder to categorize the datasets. Provide a relevant name and add the required datasets from the available list.

  3. Actions performed on a dataset: Click on the three-dot kebab menu and the following menu will be displayed. For more details, please check out page.

  4. Data tab: Displays all the datasets, allowing you to add or modify transformation nodes (SQL, Python, type changes) and perform data preparation actions.

  5. Metadata tab: Here, you can

  • Add user-friendly display names (e.g., “Booking Date” instead of BOOKED_DATE), synonyms, and descriptions to columns.

If you use Kaiya feature (where enabled), you can auto-generate synonyms, display names, and desciptions for large sets of columns.

  • Choose relevant data types (string, numeric, date/time) to ensure proper aggregations. For example, if BOOKED_DATE is incorrectly typed as string, you can’t do date-based filtering or time-series analysis properly.

  1. Scripting tab: After verifying metadata, you might need advanced transformations that exceed basic pipeline nodes. For example, you want to:

    • Add custom columns with SQL or Python.

    • Join multiple datasets (often more than two) based on business rules.

    • Aggregate or filter big data beyond what’s feasible in a single pipeline step.

  2. Data Fusion tab: Data fusion is intended for simpler merges, typically merging exactly two datasets in a point-and-click fashion, without writing SQL.

  3. Schedule: Use the Schedule option to refresh and keep your data in sync with the most up-to-date information available. A user can choose from a set of refresh modes and have flexibility in setting the refresh schedule.

  4. Pipeline: Visual representation of transformations or nodes that have been applied to the dataset.

    If you click Edit, this area will show your pipeline steps (e.g., an SQL node, Python node, or partitioning settings).

  5. Search Columns: Quickly filters the displayed columns by typing part of the name or label.

  6. Footer: Displays the datasource name, preview of the dataset rows and columns. Also, it displays the timestamps indicating the last dataset refresh and the dataset creation.

Assign measures (numeric fields), dimensions (for grouping or filtering), and date columns. For more details about the distinction, check out page.

Export (Writeback): Think of this as “saving” the cleaned or transformed dataset outside of Tellius. Depending on which connector you pick, you’ll either generate a local file (e.g., CSV) or publish it to an external system (e.g., HDFS, Snowflake etc.). For more details, check out page.

Edit: Transforms the page into Edit mode where you can edit and apply transformations to the selected dataset. For more details, check out page.

Column headers: Displays each column in the selected dataset where you can sort the columns or apply filters on the fly. For more details on editing the dataset, check out page.

🔢
🐣
this
this
this
this
this
this
Understanding the Prepare page
New folder creation for datasets