Skip to main content

Table 1 Primer on Conditional Average Treatment Effect (CATE) estimation

From: Comparison of causal forest and regression-based approaches to evaluate treatment effect heterogeneity: an application for type 2 diabetes precision medicine

Evaluation framework

In a potential outcomes framework, the causal effect of a treatment on a patient is defined by the difference in outcomes, where the outcomes are obtained for two different treatment assignments. The conditional average treatment effect (CATE) is defined as the average over individual treatment effects for a subpopulation determined by specific patient characteristics. The estimation of such subgroup-specific treatment effects has traditionally relied on a manual comparison of pre-defined patient sub-populations. However, this is not necessarily possible for subgroups determined by unknown covariate relationships or for higher-dimensional datasets. We evaluate two different methods that are able to estimate conditional average treatment effects, which represent differential patient responses to a treatment allocation

Penalized regression

Standard maximum likelihood regression models can estimate CATE by including treatment-by-covariate interaction terms. For each covariate, the interaction term coefficient(s) represent the estimated differential treatment effect associated with that covariate. The model can then be used to predict the counterfactual outcome on each therapy, conditional on the features included as interaction terms. The difference between the predicted outcome on each therapy provides an estimate of the patient-level treatment effect. Penalized regression can be used to reduce overfitting and potentially improve prediction in new data

Causal forest

Causal forest is a data-driven ensemble method built over many individual causal trees to estimate the CATE [6]. A causal tree [5] modifies the traditional CART structure [25] to explicitly optimise for treatment effect heterogeneity and generates estimates at the leaves of the trees. Causal trees utilise a separate sample to detect the tree structure and another sample to estimate the treatment effects, this double-sample approach (also referred to as honest) helps to overcome the problem of over-fitting. Similar to the random forest for outcome prediction, each causal tree within the causal forest is built over a bootstrap sample from the training data and the forest averages over the tree generated treatment effects. In general, use of a forest over a large number of individual trees has been shown to more stable and produce more accurate results than an individual tree [21].