SQL code snippets
Quickly transform columns, create rolling averages, build bins, or run more complex window functions
Last updated
Was this helpful?
Quickly transform columns, create rolling averages, build bins, or run more complex window functions
Last updated
Was this helpful?
When working with Tellius, you may need to perform data transformations beyond simple aggregations or filters. SparkSQL (the SQL engine used by Tellius) offers a host of built-in functions and advanced window operations—like converting time zones, lag/lead comparisons, rolling averages, and more. Below, we highlight several commonly needed transformations, their use cases, and sample SQL queries.
You can adapt them to your own datasets in the Tellius SQL editor (check out page for more details on how to access and apply these transformations in your workflow).
Scenario: You have a timestamp column in a specific time zone and want to standardize it to UTC for consistent analysis.
Function: to_utc_timestamp()
Explanation: Converts origin_timestamp
, which is in origin_timezone
, to UTC.
Example: If origin_timezone
is 'America/Los_Angeles', then time_utc
becomes the corresponding UTC timestamp.
Scenario: Compare current row values to a previous (lag) or next (lead) row. Useful for identifying changes or trends (e.g., comparing week-over-week or month-over-month sales).
Function: LAG(column) OVER(...)
Explanation: Retrieves the prior Sales value for the same Order_Id
, ordered by Date
.
Example: Compare this week’s Sales to the previous week’s (Prev_Sales
) to see the difference.
Function: LEAD(column) OVER(...)
Explanation: Retrieves the next Sales value for the same Order_Id
, ordered by Date
.
Example: Forecast or compare today’s Sales to a post-period’s Sales.
Scenario: Track moving averages (e.g., a 2-week rolling average of Sales), excluding the current row to gauge past performance.
Function: AVG(column) OVER(...)
with a window frame definition.
Explanation: Looks at the two preceding rows (excluding current row) to compute an average.
Note: Adjust the frame (e.g., ROWS BETWEEN 3 PRECEDING AND 1 PRECEDING
) for a 3-day/week/month rolling window.
Scenario: Segment numeric values (e.g., net_sales
) into a fixed number of buckets (like quartiles or deciles) for each product category.
Function: NTILE(n) OVER(...)
Explanation: Divides the ordered partition into n
buckets. Here, 4 bins give you quartiles.
Example: Quickly categorize product categories into 1 (highest net sales) through 4 (lowest net sales).
Scenario: You want to compare each row’s Sales
to the state-wide average.
Explanation: Sub-query or common table expression (CTE) for computing aggregated values, then join back to the main table. Each row in a
now has a new column State_Sales_Average
for contextual comparison.
Scenario: Conditionally update column values based on custom logic.
Function: CASE WHEN ... THEN ... ELSE ... END
Explanation: If StateCode
is 'AR', set it to 'FL'. If 'GE', set it to 'AL'. Otherwise, keep existing value.
Function: COALESCE(column, default_value)
Explanation: Replaces NULL
with a default (numeric or string) value.
Function: date_trunc('unit', column)
Explanation: Rounds down the timestamp to a specified boundary (hour, day, week, month, quarter, etc.).
Function: datediff(end, start)
Explanation: Returns the difference in days between two date/timestamp columns.
These SQL snippets illustrate some of the powerful transformation capabilities available in SparkSQL—particularly beneficial for preparing and enriching your data in Tellius before deeper analysis. By combining basic functions (e.g., CASE
, COALESCE
) with advanced window operations (LAG
, LEAD
, NTILE
, OVER
clauses), you can build sophisticated data transformations directly in the Tellius SQL editor.