# SQL code snippets

When working with Tellius, you may need to perform data transformations beyond simple aggregations or filters. SparkSQL (the SQL engine used by Tellius) offers a host of built-in functions and advanced window operations—like converting time zones, lag/lead comparisons, rolling averages, and more. Below, we highlight several commonly needed transformations, their use cases, and sample SQL queries.

You can adapt them to your own datasets in the Tellius SQL editor (check out [this](https://help.tellius.com/tellius-5.6/data/preparing-your-datasets/list-of-icons-and-their-actions/sql-transform) page for more details on how to access and apply these transformations in your workflow).

### 1. Converting a time column to UTC

* **Scenario**: You have a timestamp column in a specific time zone and want to standardize it to UTC for consistent analysis.

```sql
SELECT 
    *,
    to_utc_timestamp(origin_timestamp, origin_timezone) AS time_utc
FROM table_name
```

* **Function**: `to_utc_timestamp()`
* **Explanation**: Converts `origin_timestamp`, which is in `origin_timezone`, to UTC.
* **Example**: If `origin_timezone` is 'America/Los\_Angeles', then `time_utc` becomes the corresponding UTC timestamp.

### 2. Calculating Lag/Lead

**Scenario**: Compare current row values to a previous (lag) or next (lead) row. Useful for identifying changes or trends (e.g., comparing week-over-week or month-over-month sales).

#### Using LAG

```sql
SELECT 
    *,
    LAG(Sales) OVER (PARTITION BY Order_Id ORDER BY Date) AS Prev_Sales
FROM table_name
```

* **Function**: `LAG(column) OVER(...)`
* **Explanation**: Retrieves the prior Sales value for the same `Order_Id`, ordered by `Date`.
* **Example**: Compare this week’s Sales to the previous week’s (`Prev_Sales`) to see the difference.

#### Using LEAD

```sql
SELECT 
    *,
    LEAD(Sales) OVER (PARTITION BY Order_Id ORDER BY Date) AS Next_Sales
FROM table_name
```

* **Function**: `LEAD(column) OVER(...)`
* **Explanation**: Retrieves the next Sales value for the same `Order_Id`, ordered by `Date`.
* **Example**: Forecast or compare today’s Sales to a post-period’s Sales.

### 3. Creating rolling averages

**Scenario**: Track moving averages (e.g., a 2-week rolling average of Sales), excluding the current row to gauge past performance.

```sql
SELECT 
    *,
    AVG(Sales) OVER (
        PARTITION BY Order_Id
        ORDER BY Date
        ROWS BETWEEN 2 PRECEDING AND 1 PRECEDING
    ) AS Two_Weeks_Rolling_Average
FROM table_name
```

* **Function**: `AVG(column) OVER(...)` with a **window frame** definition.
* **Explanation**: Looks at the two preceding rows (excluding current row) to compute an average.
* **Note**: Adjust the frame (e.g., `ROWS BETWEEN 3 PRECEDING AND 1 PRECEDING`) for a 3-day/week/month rolling window.

### 4. Creating bins/buckets with NTILE

**Scenario**: Segment numeric values (e.g., `net_sales`) into a fixed number of buckets (like quartiles or deciles) for each product category.

```sql
SELECT 
    product_category_name,
    month,
    net_sales,
    NTILE(4) OVER (
        PARTITION BY product_category_name
        ORDER BY net_sales DESC
    ) AS net_sales_group
FROM table_name
```

* **Function**: `NTILE(n) OVER(...)`
* **Explanation**: Divides the ordered partition into `n` buckets. Here, 4 bins give you quartiles.
* **Example**: Quickly categorize product categories into 1 (highest net sales) through 4 (lowest net sales).

### 5. Creating an average in the same table

**Scenario**: You want to compare each row’s `Sales` to the **state-wide average**.

```sql
SELECT
    a.*,
    b.State_Sales_Average
FROM table_name a
LEFT JOIN (
    SELECT 
        State, 
        AVG(Sales) AS State_Sales_Average
    FROM table_name 
    GROUP BY State
) b
ON a.State = b.State
```

* **Explanation**: Sub-query or common table expression (CTE) for computing aggregated values, then join back to the main table. Each row in `a` now has a new column `State_Sales_Average` for contextual comparison.&#x20;

### 6. Using CASE statements

**Scenario**: Conditionally update column values based on custom logic.

```sql
SELECT 
    Employee_Name,
    CASE 
        WHEN StateCode = 'AR' THEN 'FL'
        WHEN StateCode = 'GE' THEN 'AL'
        ELSE StateCode 
    END AS StateCode
FROM table_name
```

* **Function**: `CASE WHEN ... THEN ... ELSE ... END`
* **Explanation**: If `StateCode` is 'AR', set it to 'FL'. If 'GE', set it to 'AL'. Otherwise, keep existing value.

### 7. Handling NULLs with COALESCE

```sql
SELECT 
    COALESCE(Sales, 0) AS Sales_No_Nulls,
    COALESCE(Comments, 'N/A') AS Comments_Filled
FROM table_name
```

* **Function**: `COALESCE(column, default_value)`
* **Explanation**: Replaces `NULL` with a default (numeric or string) value.

### 8. Date/Time truncation

```sql
SELECT 
    date_trunc('month', timestamp_col) AS month_start,
    COUNT(*) AS num_events
FROM table_name
GROUP BY date_trunc('month', timestamp_col)
```

* **Function**: `date_trunc('unit', column)`
* **Explanation**: Rounds down the timestamp to a specified boundary (hour, day, week, month, quarter, etc.).

### 9. Calculating time differences

```sql
SELECT
    start_time,
    end_time,
    datediff(end_time, start_time) AS diff_in_days
FROM table_name
```

* **Function**: `datediff(end, start)`
* **Explanation**: Returns the difference in days between two date/timestamp columns.

These SQL snippets illustrate some of the powerful transformation capabilities available in SparkSQL—particularly beneficial for preparing and enriching your data in Tellius before deeper analysis. By combining basic functions (e.g., `CASE`, `COALESCE`) with advanced window operations (`LAG`, `LEAD`, `NTILE`, `OVER` clauses), you can build sophisticated data transformations directly in the Tellius SQL editor.
