# Monitoring usage and system resources

This guide helps Tellius admins track system usage, monitor dataset growth, audit user activity, and optimize platform performance. By running the provided SQL queries via a secure Postgres data source, you can create reusable datasets and dashboards that offer visibility into how Tellius is being used across your organization.

{% hint style="info" %}
These queries require access to the system metadata database. Contact your Tellius admin or support team to set up a secure, validated Postgres connection.
{% endhint %}

## Connecting to the metadata database

To run the queries in this guide, you’ll need to create a Postgres data source that connects to your Tellius instance’s internal metadata store.

1. Under **Data → Connect**, choose **Postgres** as the connection type.
2. Provide the host, port, database name, username, and password (your Tellius admin or support team will supply these).
3. Save and test the connection.
4. Once connected, you can use the **Custom SQL** option in the dataset creation flow to execute the queries below.

## Setting up the metadata connection

To run system-level queries (like usage tracking, dataset stats, user activity, etc.), you'll need to connect to Tellius internal metadata databases using a Postgres data source.

**Databases you may connect to:**

* **`middleware_prod`** – Stores core platform metadata such as users, datasets, Business Views, and configurations.
* **`middleware_tracking_prod`** – (Optional) May contain event tracking or usage-related metadata depending on deployment.
* **`usage_tracker`** – Logs detailed activity, such as search queries, refresh events, Vizpad usage, and user logins.

## Metadata schema overview

Once connected, you can query internal tables that define user access, roles, and group mappings in your Tellius environment.

| Table          | Description                                                               |
| -------------- | ------------------------------------------------------------------------- |
| `users`        | Stores all registered users and their metadata (e.g., name, email, roles) |
| `groups`       | Represents logical user groups used for access control                    |
| `users_groups` | Maps which users belong to which groups                                   |

## Monitoring datasets

This query lists each dataset in the system with key metadata such as name, owner, number of rows and columns, and size (in kilobytes).

```sql
SELECT 
  dm."datasetName" AS DatasetName,
  us.username AS OwnerUsername,
  ((dm.current->'size')::jsonb::text::bigint)/1024 AS SizeKB,
  jsonb_array_length(dm.current->'columns') AS NoOfColumns,
  (dm.current->'numberOfRows')::jsonb::text::bigint AS NoOfRows,
  dm.inserted_at AS Created,
  dm.updated_at AS Updated
FROM dataset_metadata dm
JOIN users us ON dm.owner_id = us.id;
```

Use this to identify:

* Large or old datasets that may be archived or refreshed
* Ownership and accountability of data assets
* Data growth trends over time

## Dataset size estimation in GB

This more advanced query estimates total dataset size by calculating per-row memory cost based on data types, then multiplying by row count.

```sql
-- Estimates dataset size in bytes and gigabytes.
-- Identifies if the dataset is live (in-database) or in-memory.

SELECT *, 
       (row_count * row_size) AS total_size_bytes,
       (row_count * row_size)/1024.0/1024.0/1024.0 AS total_size_gb
FROM (
  SELECT 
    dm."datasetId" AS id,
    dm."datasetName" AS name,
    us.username AS owner,
    SUM(CASE
      WHEN type = 'boolean' THEN 1
      WHEN type = 'short' THEN 1
      WHEN type = 'integer' THEN 2
      WHEN type = 'float' THEN 4
      WHEN type = 'long' THEN 8
      WHEN type = 'double' THEN 8
      WHEN type = 'date' THEN 12
      WHEN type = 'timestamp' THEN 15
      WHEN type = 'string' THEN 15
      ELSE 0
    END) AS row_size,
    ((dm.current->>'numberOfRows')::bigint) AS row_count,
    CASE WHEN "additionalParams"->>'indb' = 'true' THEN 'live' ELSE 'in-memory' END AS dataset_type
  FROM dataset_metadata dm
  JOIN users us ON dm.owner_id = us.id,
       LATERAL jsonb_array_elements_text(current->'columns') AS json_value,
       LATERAL (SELECT json_value::jsonb->'typeStats'->>'mainType' AS type) AS types
  GROUP BY dm."datasetId", dm."datasetName", us.username, dm.current, "additionalParams"
) a;
```

{% hint style="success" %}
You can build dashboards to monitor memory usage trends, identify heavy datasets, and improve performance by targeting optimizations.
{% endhint %}

## User activity monitoring

Lists Tellius users along with their email addresses and names. Use this to track platform adoption, audit access, or manage licenses.

```sql
SELECT 
  username AS "UserName",
  email AS "Email",
  "firstName" AS "FirstName",
  "lastName" AS "LastName"
FROM users;
```

Monitor usage of natural language search by tracking what queries users submit and when. Combine with user logs to understand adoption of search and identify training opportunities.

```sql
SELECT 
  query,
  time,
  username
FROM query_history q
JOIN users u ON q.userid = u.id;
```

## Daily usage analytics

This query aggregates daily activity across various Tellius features such as Search, Insights, data operations, Vizpads, and user logins. Visualize this in a line chart to observe daily or weekly usage patterns.

```sql
SELECT 
  DATE_TRUNC('day', action_at) AS day,
  COUNT(CASE WHEN resource_type = 'insight' THEN 1 END) AS insight_operations,
  COUNT(CASE WHEN resource_type IN ('dataset', 'datasource', 'businessView') THEN 1 END) AS data_operations,
  COUNT(CASE WHEN resource_type = 'search_query' THEN 1 END) AS search_queries,
  COUNT(CASE WHEN resource_type = 'model' THEN 1 END) AS model_operations,
  COUNT(CASE WHEN resource_type = 'vizpad' AND operation_type = 'created' THEN 1 END) AS vizpad_created,
  COUNT(CASE WHEN resource_type = 'vizpad' AND operation_type = 'viewed' THEN 1 END) AS vizpad_viewed,
  COUNT(CASE WHEN resource_type = 'user' AND operation_type = 'login_success' THEN 1 END) AS user_logins,
  COUNT(DISTINCT user_id) AS unique_users
FROM usage_tracking
WHERE action_at >= '2023-01-01'
GROUP BY 1
ORDER BY 1;
```

## Kaiya conversational AI usage

Track how often users engage with Kaiya

```sql
SELECT 
  DATE_TRUNC('day', inserted_at) AS day,
  COUNT(DISTINCT conversation_id) AS conversations,
  COUNT(CASE WHEN sender = 'bot' AND LOWER(status) != 'failure' THEN 1 END) AS kaiya_responses
FROM messages
WHERE inserted_at >= '2023-01-01'
GROUP BY 1
ORDER BY 1;
```

## Monitoring Business View

Monitor how large each Business View is to optimize performance.

```sql
SELECT 
  id,
  name,
  metadata ->> 'length' AS bv_size,
  us.username AS owner_username
FROM business_views bvs
JOIN users us ON bvs.owner_id = us.id;
```

Understand which Business Views are shared with which groups. Use this to audit access control and sharing practices.

```sql
SELECT 
  business_view_id,
  "sharedWithKind",
  "objectKind",
  gps.name AS group_name
FROM sharings shs
JOIN groups gps ON shs.shared_with_group_id = gps.id
WHERE "objectKind" = 'businessView' AND "sharedWithKind" = 'group';
```

## **Business View ↔ Vizpad mapping**

This helps you understand which Vizpads rely on which Business Views, and how many charts, KPIs, etc. each one contains. This helps you to analyze impact before modifying or deleting a BusinessView

* Design optimization (e.g., identifying Vizpads using large or outdated views)
* Auditing dependencies for dashboard maintenance

```sql
SELECT 
  bv.name AS businessview_name,
  vp.title AS vizpad_title,
  COALESCE(vc.viz_count, 0) AS viz_count,
  bv.id AS business_view_id,
  vp.id AS vizpad_id
FROM vizpads vp
JOIN business_views bv 
  ON (bv.id = ANY(vp.business_view_ids) OR bv.id = vp.business_view_id)
LEFT JOIN (
  SELECT vt.vizpad_id, COUNT(*) AS viz_count
  FROM viz_tab vt 
  JOIN viz_item vi ON vt.id = vi.tab_id
  GROUP BY vt.vizpad_id
) vc ON vp.id = vc.vizpad_id;
```

{% hint style="info" %}
To merge Business View size and sharing details for offline audit or reporting, use the provided Python script to join exported `.csv` files and calculate sizes in GB. Useful for access control reviews or cleaning up unused assets.
{% endhint %}

## **Snowflake data source metadata**

Track how Snowflake is used in your environment, including:

* How many datasets are built on each connection
* Who owns the connections
* Which auth methods are in use

```sql
SELECT 
  ds."datasourceId" AS datasource_id,
  ds.name AS name,
  ds.inserted_at AS creation_time,
  MAX(us.email) AS owner_email,
  COUNT(*) AS no_of_datasets_loaded,
  options->>'sfURL' AS url,
  options->>'sfUser' AS user,
  options->>'sfSchema' AS schema,
  options->>'sfDatabase' AS database,
  options->>'sfRole' AS role,
  options->>'auth' AS auth_method
FROM datasources ds
LEFT JOIN dataset_metadata dm ON ds."datasourceId" = dm.datasource_id
JOIN users us ON ds.owner_id = us.id
WHERE "sourceType" = 'snowflake'
GROUP BY ds."datasourceId";
```

## Scheduled refresh monitoring

Monitor refresh jobs across datasets, models, Vizpads, and Insights.

```sql
SELECT 
  sc.id,
  sc.name,
  sc.frequency,
  CASE sc.frequencyunit
    WHEN 'm' THEN 'Minutes'
    WHEN 'h' THEN 'Hours'
    WHEN 'd' THEN 'Days'
    WHEN 'w' THEN 'Weeks'
    WHEN 'M' THEN 'Months'
    ELSE sc.frequencyunit
  END AS frequency_unit,
  sc.object->>'objectType' AS object_type,
  COALESCE(vp.title, dm."datasetName", mod."modelId", ds.name, ins.name) AS object_name,
  us.username AS owner_name
FROM schedules sc
JOIN users us ON sc.owner_id = us.id
LEFT JOIN vizpads vp ON vp.id = sc.vizpad_id
LEFT JOIN dataset_metadata dm ON dm."datasetId" = sc.dataset_id
LEFT JOIN models mod ON mod."modelId" = sc.model_id
LEFT JOIN datasources ds ON ds."datasourceId" = sc.datasource_id
LEFT JOIN insights ins ON ins."insightId" = sc.insight_id;
```

## Estimating cluster resource handling capacity

The amount of data Tellius cluster can handle depends on two factors:

1. Whether cluster autoscaling is enabled or disabled
2. The number of allocated CPU cores in the cluster

The following formulas help estimate how much data your cluster can process and store efficiently.

* **numberOfCores**: Total number of CPU cores in your Tellius cluster.
* **3.125**: This is the baseline GB-to-core multiplier.

### 🚫 Cluster autoscaling d**isabled**

When autoscaling is turned off, the system cannot automatically add or remove resources based on workload. You must size the cluster manually to meet expected demand.

**Ideal data capacity:**

```
Ideal Capacity (GB) = 3.125 × 1.20 × numberOfCores
```

The `1.20` factor accounts for additional memory overhead used by background operations and system processes.

#### For maximum supported data with degradation:

```
Max Capacity (GB) = 3.125 × 1.20 × 1.25 × numberOfCores
```

The `1.25` multiplier allows for up to 25% more data to be handled, though performance (e.g., response time or refresh speed) may slightly degrade.

For a cluster with 64 cores:

* **Ideal capacity** = 3.125 × 1.20 × 64 = 240 GB
* **Max capacity** (with some degradation) = 3.125 × 1.20 × 1.25 × 64 = 300 GB

### ✅ Cluster autoscaling e**nabled**

When autoscaling is enabled, the cluster can dynamically adjust resources based on demand. In this setup, system overhead is already managed, so the formula is simpler.

#### Ideal data capacity:

```
Ideal Capacity (GB) = 3.125 × numberOfCores
```

#### For maximum supported data with degradation:

```
Max Capacity (GB) = 3.125 × 1.25 × numberOfCores
```

{% hint style="info" %}
These values are **estimates**, not strict limits. Actual performance may vary based on:

* Data format (CSV vs. Parquet)
* Number of columns
* Query complexity
* Number of concurrent users
  {% endhint %}

{% hint style="warning" %}
Always monitor performance during peak usage and benchmark with real workloads before finalizing cluster sizing. For production workloads with heavy ETL, AutoML, or frequent refreshes, consider staying well below max capacity to maintain responsiveness.
{% endhint %}

## **Cluster sizing reference table**

Here’s a simple comparison to help you estimate how much data your cluster can handle, depending on whether autoscaling is enabled or not:

<table><thead><tr><th># Cores</th><th>Autoscaling</th><th>Ideal capacity </th><th width="260.3822021484375">Max capacity (with 25% buffer)</th></tr></thead><tbody><tr><td>32</td><td>Enabled</td><td>100 GB</td><td>125 GB</td></tr><tr><td>32</td><td>Disabled</td><td>120 GB</td><td>150 GB</td></tr><tr><td>64</td><td>Enabled</td><td>200 GB</td><td>250 GB</td></tr><tr><td>64</td><td>Disabled</td><td>240 GB</td><td>300 GB</td></tr><tr><td>128</td><td>Enabled</td><td>400 GB</td><td>500 GB</td></tr><tr><td>128</td><td>Disabled</td><td>480 GB</td><td>600 GB</td></tr></tbody></table>

{% hint style="success" %}

* These queries are designed for read-only analysis and should not modify any system data.
* Only accessible to users with proper admin privileges.
* For additional visibility, create datasets and Business Views from these queries to power monitoring dashboards in Tellius.
  {% endhint %}

## **Calculated columns**

This query shows all **custom formulas** used in Business Views.

```sql
SELECT DISTINCT "formulaType", formula 
FROM business_views_columns 
WHERE formula IS NOT NULL;
```
