🧩Editing Prepare → Data

Performing preliminary transformations to your datasets

Under Prepare → Data, you can validate data accuracy, review columns, see row distribution, perform preliminary transformations (in Edit mode).

Statistics

Lets you access column-level statistics displaying summary metrics and a quick visualization of the column’s distribution.
Below the column name, you can find a green colored bar indicating the column’s recognized data type (e.g., date/time, numeric, or string). Hovering over the bar displays "Main type: string 100.00%” which tells you that every single row (100% of values) fits that text/string pattern—there are no exceptions detected that might suggest a numeric or date/time type.

Count: Total number of rows inspected
Missing (NULL): Number of records with no value in this column
Invalid: Number of entries that do not conform to the column’s data type
Unique Value: How many distinct values appear in the column

If the “Missing” or “Invalid” counts are unexpectedly high, you may need to transform or cleanse your data (e.g., converting strings to dates, filling nulls).

A high uniqueness count relative to total rows suggests this column might be a candidate for a primary key or near-unique identifier.

A quick visualization on the right shows how the data is distributed.
1. Helps you to instantly gauge whether data is uniformly distributed or if certain ranges cluster heavily.
2. Spot potential anomalies—e.g., if you see a spike in certain months or a total gap in a given time range.
In the above example, each vertical blue bar represents a set of date/time values plotted on an X-axis. The X-axis labels can appear bunched if the dataset is large or if date values are extremely granular. Hovering over may clarify the distribution.
Click on the burger menu icon above the chart to view the following menu. Here, you can,

View the chart in full screen
Print the chart
Download the image (as PNG, JPEG, PDF, or SVG)

Filter (Edit mode)

Click on the Filter icon and the above image will be displayed. This filter does not modify the dataset pipeline or permanently remove rows. Instead, it’s a quick filter for on-screen data inspection—you’re basically hiding certain rows in the immediate view without altering the underlying dataset.
If you do want to permanently remove or transform rows in the actual pipeline, you can click “Transform data” to switch modes and the following window will be displayed.

Unlike the view-only filter, applying a filter here alters the dataset in your pipeline or script. Rows that do not meet the condition are permanently removed from the dataset version that’s being prepared.
The “+” icon lets you add further filter clauses (e.g., “Column A > 10” AND “Column B = ‘XYZ’”).
The transformation is saved in the pipeline. If you publish these changes, the dataset reloads with rows excluded per your filter logic.

If you do “Transform data”, you’ll see an updated step in your pipeline, and you may need to re-publish or validate.

If you do a “View-Only” filter, no pipeline changes occur, and you won’t see new steps added. The dataset reverts to normal after you exit or clear the filter.

Sorting

Click on the ↓ icon to sort the column data in ascending order.
Click on the ⬆ icon to sort the column data in descending order.

Transforming a column

Click on any required column name, and you can view the following menu. These transform tools allow you to refine and reshape columns in various ways—whether adjusting data types, altering text, or performing merges and splits.

Data Type Transform

This submenu lets you convert a column’s data type. Here are the options:

String: Interprets the column as textual data (e.g., “ABC123”).
Double: Interprets the column as floating-point numeric type (e.g., 3.14159). Use if you need decimal precision or have fractional values.
Date: Interprets the column as a date (YYYY-MM-DD) without a time component.
Integer: Interprets the column as whole numbers only (e.g., 42).
Timestamp: Includes both date and time details. Use if you have data like 2023-01-15 10:25:00 or an ISO-8601 string (2023-01-15T10:25:00Z).

Column Transform

This submenu is for general transformations (not strictly text-based). Options include:

Add Column: Creates a new empty column.
Rename Column: Changes the actual column name.

You can change the name of a column in your dataset but doing so might cause issues, such as breaking existing connections or processes that depend on the current column name. To avoid these risks, use "Display Name" as an alternative, which lets you show a different name without actually renaming the column itself.

Move Column: Reorder columns in the dataset (e.g., bring an important column to the front). This has sub-options like Before previous column, After next column, Before column, and After column.

Merge column: Combine two columns into one—often used to concatenate strings (e.g., FirstName + LastName) or unify numeric fields. Here, you specify another column to merge and provide a name to the newly merged column.

Find and Replace: Search for specific text or patterns in the selected column and replace them with something else.

Set as Target variable: Usually relevant for ML or predictive analytics tasks. Incdicates that this column is the outcome variable (label) for training a model.
Split Rows: Splits each row of data if it contains multiple, line-delimited items. If a single cell has multiple lines or values separated by line breaks, this transforms them into multiple rows. The Delimiter field specifies the exact character or substring used to identify where to break a single row into multiple rows.

Text Transform

If your dataset contains textual columns you want to analyze, these transformations help standardize or clean the text for better search, NLP, or machine learning outcomes.

Upper case: Converts the entire column’s text to upper case (e.g., abc → ABC).
Lower case: Converts all text to lower case (e.g., ABC → abc).
Remove stop words: Removes common filler words from text (e.g., “the,” “and,” “of”), often used in NLP or text analytics.
Stem: Applies a stemming algorithm (e.g., Porter stemmer) to reduce words to their base form (e.g., “running,” “runs,” “ran” → “run”). Often used to group word variants.

Filter Column

Same as the filter explained above.

Delete Column

Permanently removes the selected column from the dataset pipeline.

PreviousWriteback window NextHandling null or mismatched values

Last updated 6 months ago

Was this helpful?