Table of Contents

Data - Best Practices

Ramya Priya Updated by Ramya Priya

  1. Start with end requirements: Work backward to figure out how the data is intended to be used in front. Use cases that require search may have different requirements for structuring the data than those that require training machine learning models. It is essential to have a scope of how the data will be used in advance.
  2. Answer key questions: Answer, at a minimum, these questions about the data:
  • Where are the data sources needed for this use case? (e.g., Snowflake, Redshift, S3, etc.)
  • Should data be loaded into Tellius in-memory or via push-down queries? (see list of data sources that have push-down queries enabled here - <insert link>)
  • How often is the data refreshed in the source system?
  • How often should it be refreshed in Tellius?
  • What makes every record unique in each dataset?
  • What are the appropriate data types for each field?
  • How do datasets relate to one another?
  • What is the primary date field? Identify this appropriately at the Business View level.
  • Are there any geographic attributes (longitude / latitude, state, etc.)? Identify these appropriately at the Business View level.
  • Identify common synonyms or aliases to define in the data preparation layer (if known in advance)
  • Are there any custom fields that need to be specified?
  • Are there any custom calculations that need to be implemented?
  1. Understand the granularity of each dataset: It is essential to understand the granularity of each dataset to avoid duplicating data when joining across multiple tables. Avoid joins that duplicate data, which can impact data volume capacity, decrease performance, and increase bandwidth required to manage the system effectively.
  2. Create a column with the value of 1 for every row: Create a column with the value of 1 for every row, naming that column something that represents the data. For example, for a dataset containing orders placed on a website, create a column called Orders, so a user can reference that field naturally in search, such as "show me orders by day," and that count will pull from that column.
  3. Governance: Establish data connections with a dedicated set of admin or power users where possible. Minimize the number of unique datasources and datasets. Review usage by dataset and/or Business View and remove unused entities every few months (recommended at a minimum every six months). Admins should review system capacity with Customer Support Manager in the Usage data (recommended at a minimum every month).
  4. Understand your data sources and their relationships to build an accurate Business View.
  5. Collaborate with technical resources to ensure the Business View is set up to support the types of questions and phrases that end users are likely to ask.
  6. Regularly review and update the Business View as new data sources are added or changes are made to existing ones.
  7. Use clear and consistent naming conventions for datasets and fields to avoid confusion and improve usability.

Did we help you?

Predict - Best Practices