{ "cells": [ { "cell_type": "markdown", "id": "83b05792", "metadata": {}, "source": [ "# Shifted Beta Geometric model with individual customer data" ] }, { "cell_type": "markdown", "id": "3913b9b2", "metadata": {}, "source": [ "In this notebook we replicate the main results and figures from \n", "\n", "Fader, P. S., & Hardie, B. G. (2007). How to project customer retention. Journal of Interactive Marketing, 21(1), 76-90. https://journals.sagepub.com/doi/pdf/10.1002/dir.20074\n", "\n", "The authors describe the Shifted Beta Geometric (sBG) model for customer behavior in a discrete contractual setting. It assumes that:\n", " * At the end of each period, a customer has a probability `theta` of renewing the contract\n", " and `1-theta` of cancelling\n", " * The probability `theta` does not change over time for a given customer\n", " * The probability `theta` varies across customers according to a Beta prior distribution\n", " with hyperparameters `alpha` and `beta`." ] }, { "cell_type": "code", "execution_count": null, "id": "5a4844d3", "metadata": {}, "outputs": [], "source": [ "import arviz as az\n", "import matplotlib.pyplot as plt\n", "import numpy as np\n", "import pandas as pd\n", "import xarray as xr\n", "\n", "from pymc_marketing import clv\n", "\n", "# Plotting configuration\n", "az.style.use(\"arviz-darkgrid\")\n", "plt.rcParams[\"figure.figsize\"] = [12, 7]\n", "plt.rcParams[\"figure.dpi\"] = 100\n", "plt.rcParams[\"figure.facecolor\"] = \"white\"\n", "\n", "%load_ext autoreload\n", "%autoreload 2\n", "%config InlineBackend.figure_format = \"retina\"" ] }, { "cell_type": "code", "execution_count": 5, "id": "256cdb7c-2b20-47bd-88f4-31151753abea", "metadata": {}, "outputs": [], "source": [ "seed = sum(map(ord, \"Individual sBG Model\"))\n", "rng = np.random.default_rng(seed)" ] }, { "cell_type": "markdown", "id": "928b7701", "metadata": {}, "source": [ "## Recreating the dataset" ] }, { "cell_type": "markdown", "id": "8b54f141", "metadata": {}, "source": [ "The dataset contains the percentage of customers still enrolled in the service after each time period. There are two distinct groups of users: regular and high-end, believed to have different attrition rates. Each group had 1000 customers at the start.\n", "\n", "As in the original paper, we will fit the data from the first 7 time periods only, and use the following periods for validation." ] }, { "cell_type": "code", "execution_count": 6, "id": "fa6b1dae", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
| \n", " | regular | \n", "highend | \n", "
|---|---|---|
| 0 | \n", "100.0 | \n", "100.0 | \n", "
| 1 | \n", "63.1 | \n", "86.9 | \n", "
| 2 | \n", "46.8 | \n", "74.3 | \n", "
| 3 | \n", "38.2 | \n", "65.3 | \n", "
| 4 | \n", "32.6 | \n", "59.3 | \n", "
| 5 | \n", "28.9 | \n", "55.1 | \n", "
| 6 | \n", "26.2 | \n", "51.7 | \n", "
| 7 | \n", "24.1 | \n", "49.1 | \n", "
| 8 | \n", "22.3 | \n", "46.8 | \n", "
| 9 | \n", "20.7 | \n", "44.5 | \n", "
| 10 | \n", "19.4 | \n", "42.7 | \n", "
| 11 | \n", "18.3 | \n", "40.9 | \n", "
| 12 | \n", "17.3 | \n", "39.4 | \n", "
Sampler Progress
\n", "Total Chains: 4
\n", "Active Chains: 0
\n", "\n", " Finished Chains:\n", " 4\n", "
\n", "Sampling for now
\n", "\n", " Estimated Time to Completion:\n", " now\n", "
\n", "\n", " \n", "| Progress | \n", "Draws | \n", "Divergences | \n", "Step Size | \n", "Gradients/Draw | \n", "
|---|---|---|---|---|
| \n", " \n", " | \n", "2000 | \n", "0 | \n", "0.23 | \n", "15 | \n", "
| \n", " \n", " | \n", "2000 | \n", "0 | \n", "0.24 | \n", "15 | \n", "
| \n", " \n", " | \n", "2000 | \n", "0 | \n", "0.23 | \n", "15 | \n", "
| \n", " \n", " | \n", "2000 | \n", "0 | \n", "0.23 | \n", "15 | \n", "
Sampler Progress
\n", "Total Chains: 4
\n", "Active Chains: 0
\n", "\n", " Finished Chains:\n", " 4\n", "
\n", "Sampling for now
\n", "\n", " Estimated Time to Completion:\n", " now\n", "
\n", "\n", " \n", "| Progress | \n", "Draws | \n", "Divergences | \n", "Step Size | \n", "Gradients/Draw | \n", "
|---|---|---|---|---|
| \n", " \n", " | \n", "2000 | \n", "0 | \n", "0.26 | \n", "15 | \n", "
| \n", " \n", " | \n", "2000 | \n", "0 | \n", "0.25 | \n", "15 | \n", "
| \n", " \n", " | \n", "2000 | \n", "0 | \n", "0.26 | \n", "15 | \n", "
| \n", " \n", " | \n", "2000 | \n", "0 | \n", "0.26 | \n", "15 | \n", "
| \n", " | mean | \n", "sd | \n", "hdi_3% | \n", "hdi_97% | \n", "mcse_mean | \n", "mcse_sd | \n", "ess_bulk | \n", "ess_tail | \n", "r_hat | \n", "
|---|---|---|---|---|---|---|---|---|---|
| alpha | \n", "0.742 | \n", "0.139 | \n", "0.495 | \n", "0.985 | \n", "0.017 | \n", "0.011 | \n", "71.0 | \n", "152.0 | \n", "1.06 | \n", "
| beta | \n", "4.400 | \n", "1.087 | \n", "2.530 | \n", "6.304 | \n", "0.128 | \n", "0.087 | \n", "73.0 | \n", "141.0 | \n", "1.06 | \n", "
| \n", " | mean | \n", "sd | \n", "hdi_3% | \n", "hdi_97% | \n", "mcse_mean | \n", "mcse_sd | \n", "ess_bulk | \n", "ess_tail | \n", "r_hat | \n", "
|---|---|---|---|---|---|---|---|---|---|
| alpha | \n", "0.723 | \n", "0.067 | \n", "0.608 | \n", "0.854 | \n", "0.004 | \n", "0.002 | \n", "225.0 | \n", "589.0 | \n", "1.02 | \n", "
| beta | \n", "1.228 | \n", "0.157 | \n", "0.945 | \n", "1.507 | \n", "0.011 | \n", "0.007 | \n", "214.0 | \n", "577.0 | \n", "1.02 | \n", "