Data
Guidelines to effectively structure and manage data in Tellius
Start with the end requirements: Work backward to figure out how the data is intended to be used. Use cases that require search may have different requirements for structuring the data than those that require training machine learning models. It is essential to have a scope of how the data will be used in advance.
Answer key questions: Answer, at a minimum, these questions about the data:
Where are the data sources needed for this use case? (e.g., Snowflake, Redshift, S3, etc.)
Should data be loaded into Tellius in-memory or via push-down queries?
How often is the data refreshed in the source system?
How often should it be refreshed in Tellius?
What makes every record unique in each dataset?
What are the appropriate data types for each field?
How do datasets relate to one another?
What is the primary date field? Identify this appropriately at the Business View level.
Are there any geographic attributes (longitude, latitude, state, etc.)? Identify these appropriately at the Business View level.
Identify common synonyms or aliases to define in the data preparation layer (if known in advance)
Are there any custom fields that need to be specified?
Are there any custom calculations that need to be implemented?
Understand the granularity of each dataset: It is essential to understand the granularity of each dataset to avoid duplicating data when joining multiple multiple tables. Avoid joins that duplicate data, which can impact data volume capacity, decrease performance, and increase the bandwidth required to manage the system effectively.
Create a column with the value of 1 for every row: Create a column with the value of 1 for every row, naming that column something that represents the data. For example, for a dataset containing orders placed on a website, create a column called Orders, so a user can reference that field naturally in search, such as "show me orders by day," and that count will pull from that column.
Governance: Establish data connections with a dedicated set of admin or power users where possible. Minimize the number of unique datasources and datasets. Review usage by dataset and/or Business View and remove unused entities every few months (recommended at a minimum every six months). Admins should review system capacity with the Customer Support Manager in the Usage data (recommended at a minimum every month).
Understand your data sources and their relationships to build an accurate Business View.
Collaborate with technical resources to ensure the Business View is set up to support the types of questions and phrases that end users are likely to ask.
Regularly review and update the Business View as new data sources are added or changes are made to existing ones.
Use clear and consistent naming conventions for datasets and fields to avoid confusion and improve usability.
Last updated
Was this helpful?