A “How To” article on the use of Causal Impact Analysis for People Analytics Practitioners.

Words immortalised by Tom Cruise in the 1996 move Jerry Maguire, yet at the heart of requests directed toward HR to quantify impact. However, quantifying impact is not always easy in HR—a field characterised by a sparsity of measurable KPIs such as product sales, conversions, defect rates, etc. While HR may not have the same measures, it can nonetheless borrow methods from neighbouring fields that deal in behavioural change and possess greater analytical maturity. To that end, this article is intended to introduce Causal Impact Analysis—a method initially developed in product marketing—to demonstrate a way to quantify impact in HR.

Causal Impact Analysis is an approach to estimate “the causal effect of a designed intervention on a time series” (https://google.github.io/CausalImpact/CausalImpact.html). A HR related example would be the following—does a marketing campaign intended to get employees in the UK to reduce leave balances have an impact? Addressing questions like this can be challenging when controlled experimental conditions are not available. Kay Brodersen and team at Google built this algorithm to address this very challenge, and kindly open-sourced it for the rest of us to benefit from.

Let’s use the HR example above–-does a marketing campaign launched in the UK that is intended to get employees to reduce leave balances have an impact? Causal Impact Analysis works by:

- examining UK employee absences over time (i.e., a time series);
- examining the absences in other, ideally comparable countries (i.e., a control time series); and
- builds a time-series model to try and predict how employee absences in the UK should have evolved without the campaign. The time series model can be built using either: a) only UK absence data, or b), using both UK data and the control time series (i.e., comparable countries).

The following code provides a walkthrough of Causal Impact Analysis using R. The example context described above (i.e., quantifying the impact of a marketing campaign launched in the UK intended to get employees to reduce leave balances), will be played out below using fictitious data. The code plays out in the following way:

- create time series data
- perform causal impact analysis using UK data & review results
- perform causal impact analysis using UK and control data (i.e., employee absence in Italy, Sweden, Spain and Germany) & review results
- quantifying the financial impact of the campaign

The below code stub generates fictitious data for our example. Please note that you can replicate this example, however, the data generated may be different, which in turn may lead to different outcomes (i.e., both visualisations, summary statistics and benefit quantified).

```
# load libraries
library(tidyverse) # workhorse
library(janitor) # better naming conventions
library(lubridate) # working with dates
library(zoo) # time series data format for causal impact
library(CausalImpact) # our method of interest
library(plotly) # making graphs attractive and interactive
set.seed(1)
# create a date sequence
# takes default timezone setting from machine
mydatetime <- as.POSIXct("2018-01-01")
date <- base::seq.POSIXt(from = mydatetime, length.out = 365, by = "day")
# control data - 4 countries (Italy, Sweden, Spain, Germany)
# numbers reflect number of employees on leave each day
it_numbers <- 100 + stats::arima.sim(model = list(ar = 0.999), n = 365)
se_numbers <- 1.5 * it_numbers + rnorm(85)
es_numbers <- 0.8 * it_numbers + rnorm (25)
de_numbers <- 2.5 * it_numbers + rnorm (125)
# case country data - UK
# implementing a gradual increase to reflect realistic behaviour
# (i.e., incremental absence/leave taking increase)
uk_numbers <- 1.2 * it_numbers + rnorm(100)
uk_numbers[275:295] <- uk_numbers[275:365] + 2
uk_numbers[296:316] <- uk_numbers[275:365] + 5
uk_numbers[317:347] <- uk_numbers[275:365] + 12
uk_numbers[348:365] <- uk_numbers[275:365] + 10
# combine the data into one tibble
data_tbl <- bind_cols(date, uk_numbers, it_numbers, se_numbers, es_numbers, de_numbers) %>%
janitor::clean_names() %>%
dplyr::rename(date = x1,
uk_numbers = x2,
it_numbers = x3,
se_numbers = x4,
es_numbers = x5,
de_numbers = x6) %>%
# ONLY INCLUDE WEEKDAY DATA - no absences on weekend!
dplyr::mutate(date2 = date %>% lubridate::as_date() %>% lubridate::wday()) %>%
# 1 = sunday and 7 = saturday, 2 - 6 = weekdays
dplyr::filter(date2 >1 & date2 < 7) %>%
# round numbers to whole numbers (no employee parts!)
dplyr::mutate_at(vars(dplyr::ends_with("_numbers")), ~ base::round(., 0)) %>%
dplyr::select(-date2)
# create UK only data
uk_data <- data_tbl %>%
dplyr::select(1:2) %>%
base::as.data.frame() %>%
zoo::read.zoo()
# create all countries data
all_country_data <- data_tbl %>%
base::as.data.frame() %>%
zoo::read.zoo()
```

We now have our employee absence data for both the UK and our control countries of Italy, Sweden, Spain and Germany. Let’s begin by assessing the impact of the campaign when looking at the UK data in isolation.

We begin by specifying the pre and post campaign periods. With this established we are able to perform the causal impact analysis and then both visualise results and provide a statistical summary of the model.

```
# establish when the intervention occurred
pre.period <- as.POSIXct(c("2018-01-01", "2018-09-30"))
post.period <- as.POSIXct(c("2018-10-01", "2018-12-31"))
# perform the causal impact analysis with UK data only
impact_uk_only <- CausalImpact::CausalImpact(data = uk_data,
pre.period = pre.period,
post.period = post.period)
# get a graphical summary of the UK only-causal impact model
graphics::plot(impact_uk_only) %>%
plotly::plotly_build()
```

The graphical visualisation of the model provides three key perspectives of the model. The top panel of the visualisation shows a blue confidence interval, in which we would expect our time series data to remain in the absence of the campaign. The second panel displays the difference between situation normal (i.e., no campaign), which is depicted by the the zero value line in the graph, and what happened in reality. In our example we see the pointwise values occur both above and below the zero line. The third panel provides a perspective on the cumulative benefit of the campaign. The visualisation adds the pointwise contributions from the second panel, showing the benefit was minimal and only realised at the end of the post intervention period. None of these visualisation panels inspire confidence in our campaign!

```
# get a summary &/or report of the model
base::summary(impact_uk_only) # summary results
```

```
Posterior inference {CausalImpact}
Average Cumulative
Actual 107 7039
Prediction (s.d.) 106 (0.66) 6988 (43.74)
95% CI [104, 107] [6895, 7070]
Absolute effect (s.d.) 0.77 (0.66) 51.07 (43.74)
95% CI [-0.47, 2.2] [-31.26, 144.0]
Relative effect (s.d.) 0.73% (0.63%) 0.73% (0.63%)
95% CI [-0.45%, 2.1%] [-0.45%, 2.1%]
Posterior tail-area probability p: 0.11503
Posterior prob. of a causal effect: 88%
For more details, type: summary(impact, "report")
```

```
# plain text walk through of result - not run
# base::summary(impact_uk_only, "report") # summary report
```

The summary results do not paint a compelling picture for our campaign either. I will call out a few key results and couple this with text lifted directly from the Summary Report (not included in this article due to its length, but a very useful explanation for Causal Impact Analysis users). The summary indicates that during the post-intervention period we had an actual average of 107 employees on leave per day. In the absence of the campaign we would have expected to have an average 106 employees on leave per day. This expectation is based on a time series model generated using pre-campaign UK data.

Our 95% confidence interval tells us we would need the number to fall above 107 (or below 105 in case the campaign had an adverse impact) to be considered to have had an impact. The summary report goes into considerably more detail, and includes the following “although the intervention appears to have caused a positive effect, this effect is not statistically significant when considering the entire post-intervention period as a whole.” This interpretation is confirmed by our Posterior tail-area probability of p = 0.146. This means that the positive effect noted in the results may be by chance and is not considered statistically significant.

Let’s repeat the process, though this time including the data from the control countries (i.e., Italy, Sweden, Spain, and Germany. With our data, and pre and post campaign periods already established we can proceed directly to modelling and examining the output of the model.

```
# perform the causal impact analysis with all data
impact_all_data <- CausalImpact::CausalImpact(data = all_country_data,
pre.period = pre.period,
post.period = post.period)
# get a graphical summary of the campaign impact
graphics::plot(impact_all_data) %>%
plotly::plotly_build()
```

This time the visualisation of the model results are more compelling! The top panel shows a blue line which is our baseline. The baseline indicates where we would expect UK data to be in the absence of the campaign (based on both the control countries and UK data). The actual data appears to exceed the baseline. The second panel, depicting the difference between no campaign (i.e., the zero line) and the actual data indicated the campaign had a positive impact. The third panel suggests that the cumulative benefit of the campaign, based on the pointwise contributions from the second panel, is considerable. The campaign appears to be working, but let’s get a statistical perspective.

```
# get a summary &/or report of the model
base::summary(impact_all_data) # summary results
```

```
Posterior inference {CausalImpact}
Average Cumulative
Actual 107 7039
Prediction (s.d.) 100 (0.36) 6600 (23.90)
95% CI [99, 101] [6555, 6646]
Absolute effect (s.d.) 6.7 (0.36) 439.1 (23.90)
95% CI [6, 7.3] [393, 484.3]
Relative effect (s.d.) 6.7% (0.36%) 6.7% (0.36%)
95% CI [6%, 7.3%] [6%, 7.3%]
Posterior tail-area probability p: 0.001
Posterior prob. of a causal effect: 99.8997%
For more details, type: summary(impact, "report")
```

```
# plain text walkthrough of result - not run
# base::summary(impact_all_data, "report") # summary report
```

The results of the model suggest that the campaign had a positive impact that is statistically significant! Our actual average of 107 employees being on leave each day is above that of the model prediction, which was 100. In addition, our actual average is well above our 95% Confidence Interval of 101. Finally the probability that this result was due to chance alone is 0.001, and we have achieved a 99.8997% probability that the campaign had a causal effect (i.e., worth betting on!).

Comparing the two models (i.e., UK data only vs. UK data and control countries) clearly illustrates the value of including control data in this instance.

Based on the period we analysed (i.e., 1/10/2018 - 31/12/2018) our second model indicates that under normal circumstances we would have expected 100 employees on average to take leave each day, while the campaign seems to have catalysed a daily average of 107 employees. The cumulative benefit of this campaign is an additional **439 UK employees taking leave during the post campaign period** (i.e., subtracting our cumulative prediction (6599) from our cumulative actual (7036)).

If the average daily cost of an UK employee is $300 GBP we could simply determine the financial benefit of the campaign by multiplying 439 by our average daily cost of an employee (i.e., $300 GBP). The estimated **gross financial benefit of the campaign would be $131,700 GBP**, which can be narrowed to a **net financial benefit of $101,700 GBP**, after subtracting our campaign costs of $30,000 GBP. Not bad HR!

In using Causal Impact Analysis, we can both:

**measure the ROI (Return-on-Investment) of behavioural campaigns**(e.g. marketing campaign), especially when we are not sure if the campaign was the only single source of the impact; and- potentially
**use Causal Impact Analysis to get feedback on impact quickly so that adjustments can be made in the moment**, as opposed to when campaign momentum has been lost.

In addition, when tied with outcome metrics (e.g., average cost of unused leave) we can quantify the impact of interventions to demonstrate ROI to internal stakeholders. Causal Impact Analysis can be employed in a variety of settings—HR, Marketing, Communications, Finance, etc. The current article is intended to both provide a simple overview of the process of applying Causal Impact Analysis, and act as catalyst to future innovative applications in the People Analytics domain.

**Happy measuring!**

If you see mistakes or want to suggest changes, please create an issue on the source repository.

Text and figures are licensed under Creative Commons Attribution CC BY 4.0. Source code is available at https://github.com/adam-d-mckinnon/distill_blog, unless otherwise noted. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

For attribution, please cite this work as

McKinnon (2021, Jan. 5). ADAM D MCKINNON: Show Me The Money! A People Analytics Guide to Measuring Impact Over Time. Retrieved from https://www.adam-d-mckinnon.com/posts/2020-12-30-causalimpactanalysis/

BibTeX citation

@misc{mckinnon2021show, author = {McKinnon, Adam D}, title = {ADAM D MCKINNON: Show Me The Money! A People Analytics Guide to Measuring Impact Over Time}, url = {https://www.adam-d-mckinnon.com/posts/2020-12-30-causalimpactanalysis/}, year = {2021} }