Preparing your datasets
Refine, transform, and organize your datasets
Last updated
Was this helpful?
Refine, transform, and organize your datasets
Last updated
Was this helpful?
Once you’ve created or imported a dataset (through Connect), you can refine, transform, and organize it under the Prepare module. The left-hand panel lists available datasets and folders, while the central workspace offers specialized tabs: Data, Metadata, Scripting, Data Fusion, and Schedule. Each tab serves a unique purpose in preparing your data for analytics.
Datasets: A list of datasets created. This left pane allows you to quickly search and sort datasets or folders by name.
Datasets can be organized into folders.
The icon of each dataset indicates the datasource type (e.g., Snowflake vs. CSV). Live datasets are indicated with a green dot.
The archived folder at the end contains older or deprecated datasets. You can still access them but they’re separated for clarity.
Click a folder to expand or collapse its contents.
Select a dataset to open it in the central Prepare workspace.
Displays the following options:
Create a new dataset: On clicking this button, you will be redirected to Data → Connect. For more details, please check out section.
Import dataset: Click on this button to import the required dataset. (Only .zip files are allowed to import)
Create a new folder: Creates a new folder to categorize the datasets. Provide a relevant name and add the required datasets from the available list.
Actions performed on a dataset: Click on the three-dot kebab menu and the following menu will be displayed. For more details, please check out page.
Data tab: Displays all the datasets, allowing you to add or modify transformation nodes (SQL, Python, type changes) and perform data preparation actions.
Metadata tab: Here, you can
Add user-friendly display names (e.g., “Booking Date” instead of BOOKED_DATE
), synonyms, and descriptions to columns.
If you use Kaiya feature (where enabled), you can auto-generate synonyms, display names, and desciptions for large sets of columns.
Choose relevant data types (string, numeric, date/time) to ensure proper aggregations. For example, if BOOKED_DATE
is incorrectly typed as string, you can’t do date-based filtering or time-series analysis properly.
Scripting tab: After verifying metadata, you might need advanced transformations that exceed basic pipeline nodes. For example, you want to:
Add custom columns with SQL or Python.
Join multiple datasets (often more than two) based on business rules.
Aggregate or filter big data beyond what’s feasible in a single pipeline step.
Data Fusion tab: Data fusion is intended for simpler merges, typically merging exactly two datasets in a point-and-click fashion, without writing SQL.
Schedule: Use the Schedule option to refresh and keep your data in sync with the most up-to-date information available. A user can choose from a set of refresh modes and have flexibility in setting the refresh schedule.
Pipeline: Visual representation of transformations or nodes that have been applied to the dataset.
If you click Edit, this area will show your pipeline steps (e.g., an SQL node, Python node, or partitioning settings).
Search Columns: Quickly filters the displayed columns by typing part of the name or label.
Footer: Displays the datasource name, preview of the dataset rows and columns. Also, it displays the timestamps indicating the last dataset refresh and the dataset creation.
Assign measures (numeric fields), dimensions (for grouping or filtering), and date columns. For more details about the distinction, check out page.
Export (Writeback): Think of this as “saving” the cleaned or transformed dataset outside of Tellius. Depending on which connector you pick, you’ll either generate a local file (e.g., CSV) or publish it to an external system (e.g., HDFS, Snowflake etc.). For more details, check out page.
Edit: Transforms the page into Edit mode where you can edit and apply transformations to the selected dataset. For more details, check out page.
Column headers: Displays each column in the selected dataset where you can sort the columns or apply filters on the fly. For more details on editing the dataset, check out page.