Connecting to Spark SQL

Steps to successfully connect and start browsing your Spark data in Tellius

Spark SQL allows structured queries against data in an Apache Spark environment. By providing the proper host, port, and credentials, Tellius can communicate with your Spark cluster, retrieve schemas, and load data for downstream analytics.

Once you select Spark SQL, you are presented with a form to specify your connection parameters.

Hostname: This is the network location of your Spark SQL system. The hostname or IP address of the Oracle database server (e.g., 52.71.252.45). Without a correct hostname, Tellius cannot establish a connection.
Port: Databases listen on specific ports. Specify the port number through which your Spark Thrift server listens. Providing the correct port ensures your requests reach the right service.
Database Name: The schema or database (logical grouping of tables) you want Tellius to explore.
User: Provide the username with appropriate permissions (atleast read-access) to read data.
Password: Provide the corresponding password for the User provided.
Datasource Name: A user-friendly name for this connection in Tellius.
Save and Browse Host: Saves the connection details and attempts to browse the host for available schemas and tables. This initiates the handshake with the Spark server.

If your database is behind a firewall, we display a Tellius IP address in this page that you may need to whitelist.

Using validated datasource connection details

If you’ve previously validated and saved a Spark SQL connection, you can reuse its details:

Use validated datasource connection details: When enabled, it reveals a dropdown to choose from existing, previously configured Spark connections.
Select datasource: Lists all pre-validated Spark connections. Select the one you want to reuse and all the fields will be filled automatically as configured.
Browse Host: Similar to “Save and Browse Host”, but now it just navigates forward using the chosen existing connection’s parameters.

Advanced Options

In the Advanced option view, you can override the default host/port fields by directly specifying a JDBC URL. This is especially useful if your Spark setup requires custom parameters or if your environment differs from typical host/port authentication.

JDBC URL: A full JDBC connection string, such as:
```
jdbc:postgresql://52.71.252.45:1521/databaseName
```
or the equivalent Spark SQL JDBC URL (e.g., jdbc:hive2://host:10000/default;transportMode=http;httpPath=cliservice for Spark Thrift).
User: The same username as in the main form, but here it pairs with your custom JDBC string if you’re not relying on the standard host/port fields.
Password: Provide the relevant password for the specified user in the advanced JDBC context.
Datasource Name: A user-friendly name for this connection in Telliu
Save and Browse Host: Saves the connection details and attempts to browse the host for available schemas and tables. This initiates the handshake with the Spark server.

Loading tables

After establishing a connection, you will see options to load data from Spark SQL tables.

Select a table: Displays all available tables under the chosen Spark SQL schema. Pick the tables you need for analysis. If there are many tables, you can narrow down your selection.
Search for table: Filters the displayed tables based on your search term.
Import: Imports the selected table(s) into Tellius.

Using Custom SQL

If you prefer more granular control or want to write your own SQL queries to load precisely the data you need, switch to "Custom SQL" tab.

Table filter: Helps locate a particular table by name before writing your SQL.
Select a table: Choose a table name to use in your custom query.
Query: A field for your custom SQL statement (e.g., SELECT * FROM SYS.WRI$_DBU_CPU_USAGE).
Preview: Executes the SQL query and displays a few sample rows of the data you’re about to import in the “Dataset Preview” area. Allows you to validate that the query returns the correct data before fully importing it. This helps catch syntax errors or incorrect filters early.
Import: Once satisfied with the preview, click Import to load the data returned by the SQL query into Tellius.

Advanced Settings

Once you import, you’ll have the option to refine how the dataset is handled:

Dataset name: Assign a valid name to your new dataset (e.g., XYZ_THRESHOLD). Names should follow the allowed naming conventions (letters, numbers, underscores, no leading underscores/numbers, no special chars/spaces).
Connection Mode When the Live checkbox is selected, the queries will be fetched from the database each time, and the data will not be copied to Tellius. Live mode ensures the most up-to-date data at the cost of potential query latency.

When Live mode is enabled, then only Create Business View option will be displayed.

Copy to system: If enabled, copies the imported data onto Tellius’s internal storage for faster performance. Reduces dependency on the source database’s speed and network latency. Good for frequently queried datasets.
Cache dataset in memory: If enabled, keeps a cached copy of the dataset in memory (RAM) for even faster query responses. Memory caching dramatically reduces query time, beneficial for dashboards and frequently accessed data.

When only one table is imported, the following options will also be displayed:

Partitioning: If enabled, it splits a large dataset into smaller logical chunks (partitions). Improves performance on large tables, enabling parallel processing and faster load times. For more details, check out this dedicated page on Partitioning.

Partition column: The column used as a basis for partitioning.
Number of partitions: How many segments to break the data into. (e.g., 5 partitions)
Lower bound/Upper bound: Approximate value range in the partition column to evenly distribute data.

Create Business View: If enabled, after loading data, you will be guided into the Business View creation stage.

Click on Load to finalize the process. After clicking Load, your dataset appears under Data → Dataset, ready for exploration, preparation, or business view configuration. Else, click on Cancel to discard the current importing process without creating the dataset.

After the dataset is created, you can navigate to "Dataset", where you can review and further refine your newly created dataset. Apply transformations, joins, or filters in the Prepare module.

PreviousConnecting to Azure Blob Storage NextConnecting to generic JDBC

Last updated 7 months ago

Was this helpful?