Tellius
Tellius 5.5
Tellius 5.5
  • 🚩Getting Started
    • 👋Say Hello to Tellius
      • Glossary
      • Tellius 101
      • Navigating around Tellius
    • ⚡Quick Start Guides
      • Search
      • Vizpads (Explore)
      • Insights (Discover)
    • ✅Best Practices
      • Search
      • Vizpads (Explore)
      • Insights (Discover)
      • Predict
      • Data
    • ⬇️Initial Setup
      • Tellius architecture
      • System requirements
      • Installation steps for Tellius
      • Customizing Tellius
    • Universal Search
    • 🏠Tellius Home Page
    • ❓FAQs
      • Kaiya Conversational AI
      • Data Preparation FAQs
      • Environment FAQs
      • Search FAQs
      • Vizpads FAQs
      • Data Caching FAQs
      • Embedding FAQs
      • Insights FAQs
  • Kaiya
    • ♟️Understanding AI Agents & Agentic Flows
      • Glossary
      • Composer
      • 🗝️Triggering an agentic workflow
      • The art of possible
    • 🤹Kaiya conversational AI
      • Triggering Insights with "Why" questions
      • Mastering Kaiya conversational AI
      • 📒Kaiya Learnings
      • Kaiya Terms of Service
      • Best practices
  • 🔍Search
    • 👋Get familiar with our Search interface
    • 🤔Understanding Tellius Search
    • 📍Search Guide
    • 🚀Executing a search query
      • Selecting a Business View
      • Typing a search query
      • Constructing effective search queries
      • Marketshare queries
    • 🔑Analyzing search results
      • Understanding search results
      • Search Inspector
      • Time taken to execute a query
      • Interacting with the resulting chart
    • 📊Know your charts in Tellius
      • Understanding Tellius charts
      • Variations of a chart type
      • Building charts from Configuration pane
      • List of chart-specific fields
      • Adding columns to fields in Configuration pane
      • Absolute and percentage change aggregations
      • Requirements of charts
      • Switching to another chart
      • Formatting charts
      • Advanced Analytics
      • Cumulative line chart
    • 🧑‍🏫Help Tellius learn
    • 🕵️‍♂️Search history
    • 🎙️Voice-driven search
    • 🔴Live Query mode
  • 📈Vizpads (Explore)
    • 🙋Meet Vizpads!
    • 👋Get familiar with our Vizpads
    • #️⃣Measures, dimensions, date columns
    • ✨Creating Vizpads
    • 🌐Applying global filters
      • Filters in multi-BV Vizpads
      • Filters using common columns
    • 📌Applying local filters
    • 📅Date picker in filters
      • Customizing the calendar view
    • ✅Control filters
      • Multi-select list
      • Single-select list
      • Range slider
      • Dropdown list
    • 👁️Actions in View mode
      • Interacting with the charts
      • Exporting tables
    • 📝Actions in Edit mode
      • 🗨️Viz-level actions
      • Copy to Clipboard
    • 🔧Anomaly management for line charts
      • Instance level
      • Vizpad level
      • Chart level
    • ⏳Time taken to load a chart
      • Instance level
      • Vizpad level
      • Chart level
    • ♟️Working with sample datasets
    • 🔁Swapping Business View of charts
      • Swapping only the current Vizpad
      • Swapping multiple objects
      • Configuring the time of swap
    • 🤖Explainable AI charts
  • 💡Insights (Discover)
    • 👋Get familiar with our Insights
    • ❓Understanding the types of Insights
    • 🕵️‍♂️Discovery Insights
      • Impact Calculation for Top Contributors
    • ➕How to create new Insights
      • 🔛Creating Discovery Insight
      • 🔑Creating Key Driver Insights
      • 〰️Creating Trend Insights
      • 👯Creating Comparison Insights
    • 🧮The art of selecting columns for Insights
      • ➡️How to include/exclude columns?
  • 🔢Data
    • 👋Get familiar with our Data module
    • 🥂Connect
    • 🪹Create new datasource
      • Connecting to Oracle database
      • Connecting to MySQL database
      • Connecting to MS SQL database
      • Connecting to Postgres SQL database
      • Connecting to Teradata
      • Connecting to Redshift
        • Access S3 Data with Redshift Spectrum
      • Connecting to Hive
      • Connecting to Azure Blob Storage
      • Connecting to Spark SQL
      • Connecting to generic JDBC
      • Connecting to Salesforce
      • Connecting to Google cloud SQL
        • Connecting to a PostgreSQL cloud SQL instance
        • Connecting to an MSSQL cloud SQL instance
        • Connecting to a MySQL Cloud SQL Instance
      • Connecting to Amazon S3
      • Connecting to Google BigQuery
        • Steps to connect to a Google BigQuery database
      • Connecting to Snowflake
        • OAuth support for Snowflake
        • Integrating Snowflake with Azure AD via OAuth
        • Integrating Snowflake with Okta via OAuth
        • Azure PrivateLink
        • AWS PrivateLink
        • Best practices
      • Connecting to Databricks
      • Connecting to Databricks Delta Lake
      • Connecting to an AlloyDB Cluster
      • Connecting to HDFS
      • Connecting to Looker SQL Interface
      • Loading Excel sheets
      • 🚧Understanding partitioning your data
    • ⏳Time-to-Live (TTL) and Caching
    • 🌷Refreshing a datasource
    • 🪺Managing your datasets
      • Swapping datasources
    • 🐣Preparing your datasets
      • 🤾Actions that can be done on a dataset
      • Data Pipeline
      • SQL code snippets
      • ✍️Writeback window
      • 🧩Editing Prepare → Data
      • Handling null or mismatched values
      • Metadata view
      • List of icons and their actions
        • Functions
        • SQL Transform
        • Python Transform
        • Standard Aggregation
        • Creating Hierarchies
      • Dataset Scripting
      • Fusioning your datasets
      • Scheduling refresh for datasets
    • 🐥Preparing your Business Views
      • 🌟Create a new Business View
      • Creating calculated columns
      • Creating dynamic parameters
      • Scheduling refresh for Business Views
      • Setting up custom calendars
      • Custom Calendars for Live Connections
    • Tellius Engine: Comparison of In-Memory vs. Live Mode
    • User roles and permissions
    • Refresh pipeline
  • Feed
    • 📩What is a Feed in Tellius?
    • ❗Alerts on the detection of anomalies
    • 📥Actions done on a tracking Feed
    • 🖲️Track a new metric
  • Assistant
    • 💁Introducing Tellius Assistant
    • 🎤Voice-based Assistant
    • 💬Interacting with Assistant
    • ↖️Selecting Business View
  • Embedding Tellius
    • What you should know before embedding
    • Embedding URL
      • 📊Embedding Vizpads
        • Apply and delete filters
        • Vizpad-related actionTypes
        • Edit, save, and share a Vizpad
        • Keep, remove, drill sections
        • Adding a Viz to a Vizpad
        • Row-level policy filters
      • 💡Embedding Insights
        • Creating and Viewing Insights
      • 🔎Embedding Search
        • Search query execution
      • Embedding Assistant
      • 🪄Embedding Kaiya
      • Embedding Feed
  • API
    • Insights APIs
    • Search APIs
    • Authentication API (Login API)
  • ✨What's New
    • Release 5.5
    • Release 5.4
      • Patches 5.4.0.1 to 5.4.0.4
      • Patch 5.4.0.5
      • Patch 5.4.1
      • Patches 5.4.1.1 and 5.4.1.2
    • Release 5.3
      • Patch 5.3.1
      • Patch 5.3.2
      • Patch 5.3.3
    • Release 5.2
      • Patch 5.2.1
      • Patch 5.2.2
    • Release 5.1
      • Patch 5.1.1
      • Patch 5.1.2
      • Patch 5.1.3
    • Release 5.0
      • Patch 5.0.1
      • Patch 5.0.2
      • Patch 5.0.3
      • Patch 5.0.4
      • Patch 5.0.5
    • Release 4.3 (Fall 2023)
      • Patch 4.3.1
      • Patch 4.3.2
      • Patch 4.3.3
      • Patch 4.3.4
    • Release 4.2
      • Patch 4.2.1
      • Patch 4.2.2
      • Patch 4.2.3
      • Patch 4.2.4
      • Patch 4.2.5
      • Patch 4.2.6
      • Patch 4.2.7
    • Release 4.1
      • Patch 4.1.1
      • Patch 4.1.2
      • Patch 4.1.3
      • Patch 4.1.4
      • Patch 4.1.5
    • Release 4.0
Powered by GitBook

© 2025 Tellius

On this page

Was this helpful?

  1. Data
  2. Create new datasource

Connecting to HDFS

Step-by-step guide to configure Hadoop data source in Tellius

PreviousConnecting to an AlloyDB ClusterNextConnecting to Looker SQL Interface

Last updated 5 months ago

Was this helpful?

  1. Under Data → Connect → Create New, select Hadoop from the available connectors. The following page will be displayed.

  1. Path: Requires the full HDFS (Hadoop Distributed File System) location to the directory or file you want to access. This defines the specific data endpoint within your Hadoop environment that Tellius will connect to. It may point to a particular folder, a single data file, or a sub-directory inside HDFS.

Consult your Hadoop administrator or DevOps engineer for the correct HDFS path structure.

Common patterns might look like hdfs://namenode:8020/user/yourusername/dataset/ or simply /user/yourusername/dataset/ if the default HDFS configuration is used.

Ensure you have the correct permissions to read the files at the specified path. If you’re not sure, test the path using an HDFS command line tool (e.g., hdfs dfs -ls /user/yourusername/dataset/) or consult your Hadoop administrator.

  1. Type: Select the data format stored at the specified HDFS path. For example, CSV, Parquet, ORC, JSON, or other supported file types.

Examine the files in your Hadoop directory or speak with the data engineering team to confirm the file type.

If you have direct access, you might use hdfs dfs -ls or hdfs dfs -cat commands to inspect file extensions or headers. For Parquet, no direct text inspection is possible, but the file extension typically indicates the format (e.g., .parquet for Parquet files).

  1. Click the Next button to move to the next step in the data configuration process.

  2. If you select CSV, TXT, XML formats, you’ll see specialized parsing options that allow you to handle delimiters, headers, stop words, case normalization, row tags, and flattening. This ensures Tellius can interpret and structure the data correctly.

Column delimiter: Ensures Tellius correctly identifies columns and parses the file structure accurately. This is the character used to separate individual columns in your CSV file (commonly a comma ,, but could also be a tab, semicolon ;, or another character).

First row is header? Indicates if the first line in the CSV file contains column names rather than data. If enabled, Tellius uses the first row as column headers. If disabled, column names may be generated generically (like col1, col2, etc.).

Quote: The character used to enclose textual fields, often ". Specifying the quote character helps Tellius correctly parse fields that contain delimiter characters inside them, such as commas in a quoted string.

Specify the text to be treated as NULL: A text field where you can define certain strings (e.g., N/A, NULL, NaN) that should be interpreted as missing values. Multiple values can be separated by commas. These will be treated as NULL during import.

Click on Next to proceed further.

Do you want to remove stop words? A toggle to remove common, low-value words (like “the,” “is,” “and”) from the text. Removing stop words can improve text analysis performance and clarity.

Do you want to convert to lowercase? If enabled, this converts all text to lowercase. Normalizing text to lowercase makes searching, filtering, and analyzing text more consistent, ensuring that “Apple” and “apple” are treated the same.

Click on Next to proceed further.

Row Tag: The XML element name represents a single record or row in your XML file (commonly <ROW> or another tag). This tells Tellius how to identify each data record.

Flatten? If enabled, attempts to convert nested XML structures into a flat, table-like format. Flattening simplifies complex, hierarchical XML data. If disabled, you may need to handle nested elements manually later.

Click on Next to proceed further.

  1. If you’re working with formats like JSON or Parquet (other than CSV, TXT, or XML), you’ll bypass these format-specific steps and jump straight to naming your dataset and selecting advanced loading and caching options.

  • Dataset name: Assign a valid name to your new dataset (e.g., XYZ_THRESHOLD). Names should follow the allowed naming conventions (letters, numbers, underscores, no leading underscores/numbers, no special chars/spaces).

  • Sheet name (only for XLXS and XLS data type): If your Excel workbook contains multiple sheets, specify the sheet you want to import. If left blank, Tellius imports the first sheet by default.

  • First row as columns (only for XLXS and XLS data type): If checked, Tellius interprets the first row of the Excel sheet as column headers.

  • Advanced Settings

    • Cache dataset in memory: If enabled, keeps a cached copy of the dataset in memory (RAM) for even faster query responses. Memory caching dramatically reduces query time, beneficial for dashboards and frequently accessed data.

    • Load sample data for faster transformations: If enabled, only a sample of the data is initially loaded. Speeds up transformations and previews. Provide the number of rows whilch need to be considered as "Sample data". You can later decide to use the full dataset for final analysis.

    • Create Business View: Enables you to directly create a Business View after loading the dataset. A Business View provides a semantic layer making the data more accessible and understandable for business users. Streamlines your workflow by letting you move directly from data ingestion to semantic modeling.

    • Switch to Multi-line pipeline? (only for JSON data type) If enabled, Tellius treats each valid JSON object across multiple lines as a single record rather than splitting them incorrectly. Ensures proper parsing of JSON files that are not strictly one record per line. This is essential for complex JSON structures.

  • Click on Load to finalize the process. After clicking Load, your dataset appears under Data → Dataset, ready for exploration, preparation, or business view configuration. Else, click on Cancel to discard the current importing process without creating the dataset.

After the dataset is created, you can navigate to "Dataset", where you can review and further refine your newly created dataset. Apply transformations, joins, or filters in the Prepare module.

Best Practices

  • Ensure that the Tellius environment (i.e., the user or service account used) has the correct read permissions on the specified HDFS path. In a secure Hadoop cluster, you may need Kerberos authentication or other access configurations.

  • Before entering the path into the UI, use Hadoop CLI commands or a Hadoop file browser tool to confirm that the path and files exist.

  • Confirm that all files within a specified directory are of the same format (e.g., all CSVs or all Parquet files) to avoid parsing issues.

🔢
🪹
Connecting to Hadoop
CSV options
Txt options
XML options
Advanced Settings
Additional options for Excel data type
Excel load options