Impact Calculation for Top Contributors

Jorge Z Updated by Jorge Z

The impact score provided when generating Insights in Tellius gives you an understanding of how a Change Reason Contributor influences a target variable of interest from your data.

These impact scores are calculated according to the specific Insight type and to the type of target variable of interest. The relevant types of Insights to understand are Trend Analysis and Comparison of Cohorts. The relevant types of target variables are percentages (or “rates”) and non-percentage numerical values.

Numerical, Non-Percentage Values

The most traditional target variable type is numerical, non-percentage value. This could be total revenue or sales, where the value is simply a dollar value for instance, $100,000 or $2,342.43. Or perhaps this is the total number of “sales orders,” with 10,290, 9, or 124124324 all being valid entries.

In this case, the impact score is used to measure another feature of the data (i.e. a variable or column other than the target variable) and how it influences the target variable value for that data point. For example, perhaps there is another variable, STATE, which tells us in what state orders took place. The impact calculation will determine, based on all of the data points in the dataset or business view, how a specific state influences that target value “sales orders.” 

Percentages or “Rates”

These are the target variables that are described by a percentage value. For instance, perhaps the target variable describes an “Approval Rate” (say, for auto loans). The rate would be described by a percentage, such as 75%.

Similar to non-percentage numerical values, the impact calculation will again be used to determine how variables or features other than the target variable influence the “Approval Rate” (or any percentage-based target variable specified by the user). Note that there is no restriction on the type of the other variables being evaluated, for instance these can be dimensional, such as STATE where a given state can be determined to have more impact on the target variable than another, even if the target variable is represented by a percentage value.

Trend Drivers

Performing a Trend Insight Analysis in Tellius results in top contributors or features of the data being calculated that “drive” or influence the target variable (or column) over time.

Specifically, this analysis corresponds to understanding change over two specific periods of time. Therefore, the impact calculation will result in a score for each variable that determines how that variable influenced the change in the target variable from time T1 to some time T2.

For percentage-based target columns, an example is shown below:

This means that the impact of the State California was calculated for each time period, and a change in influence was detected with respect to the target variable.

NOTE: this is NOT simply a difference between the target variable for that feature. In this case, the target variable is Approval Rate and the Contributor is State (a feature or other variable of the data). The change in approval rate is 4.6%, but this is different from the change in impact that the State of California had at each respective point in time. This is because the calculation must take into account the data points themselves in aggregate for a truly holistic view, to avoid risking and overly simplistic (and misleading) calculation.

For an extreme example, consider State Iowa (in the hypothetical underlying data set corresponding to the image above). State Iowa may consist of 2 data points, with 1 Approval during time period 1, and 1 data point with an approval during time period 2. The approval rate would appear to be:

50% ⇒ 100%

And the influence may be naively believed to be: “Iowa had a 2x impact on the overall approval rate”

But this is incorrect. Our impact calculation takes this into account, and adjusts the ranking of the top contributors accordingly.

Comparison of Cohorts

Similarly, you can generate Comparison insights which compares two cohorts against a variable of interest. For instance, two cohorts can exist around a feature "Country" with values "Germany" and "Portugal."

Below is a sample calculation pertaining to a non-percentage-based numerical target variable.

Note the impact score of Drug=Pepto-Bismol on the overall total sales. Thus, the notion of the impact score calculation is applicable across the various insight type and target-column type scenarios.

How did we do?