--- title: "model agnostic breakDown plots for ranger" author: "Przemyslaw Biecek" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{model agnostic breakDown plots for ranger} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` Here we will use the HR churn data (https://www.kaggle.com/) to present the `breakDown` package for `ranger` models. The data is in the `breakDown` package ```{r} library(breakDown) head(HR_data, 3) ``` Now let's create a `ranger` classification forest for churn, the `left` variable. ```{r} library(ranger) HR_data$left <- factor(HR_data$left) model <- ranger(left ~ ., data = HR_data, importance = 'impurity', probability=TRUE, min.node.size = 2000) predict.function <- function(model, new_observation) predict(model, new_observation, type = "response")$predictions[,2] predict.function(model, HR_data[11,]) ``` But how to understand which factors drive predictions for a single observation? With the `breakDown` package! Explanations for the trees votings. ```{r, fig.width=7} library(ggplot2) explain_1 <- broken(model, HR_data[11,-7], data = HR_data[,-7], predict.function = predict.function, direction = "down") explain_1 plot(explain_1) + ggtitle("breakDown plot (direction=down) for ranger model") explain_2 <- broken(model, HR_data[11,-7], data = HR_data[,-7], predict.function = predict.function, direction = "up") explain_2 plot(explain_2) + ggtitle("breakDown plot (direction=up) for ranger model") ```