{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "(mmm_time_slice_cross_validation)=\n", "# Time-Slice-Cross-Validation and Parameter Stability\n", "\n", "In this notebook we will illustrate how to perform time-slice cross validation for a media mix model. This is an important step to evaluate the stability and quality of the model. We not only look into out of sample predictions but also the stability of the model parameters.\n", "\n", "These imports and configurations form the fundamental setup necessary for the entire span of this notebook.\n", "\n", "The expectation is that a model has already been trained using the functionalities provided in prior versions of the PyMC-Marketing library. Thus, the data generation and training processes will be replicated in a different notebook. Those unfamiliar with these procedures are advised to refer to the [\"MMM Example Notebook.\"](https://www.pymc-marketing.io/en/stable/notebooks/mmm/mmm_example.html)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Prepare Notebook" ] }, { "cell_type": "code", "execution_count": 268, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The autoreload extension is already loaded. To reload it, use:\n", " %reload_ext autoreload\n" ] } ], "source": [ "import warnings\n", "\n", "import arviz as az\n", "import matplotlib.pyplot as plt\n", "import numpy as np\n", "import pandas as pd\n", "\n", "from pymc_marketing.mmm.time_slice_cross_validation import TimeSliceCrossValidator\n", "from pymc_marketing.paths import data_dir\n", "\n", "warnings.simplefilter(action=\"ignore\", category=FutureWarning)\n", "\n", "az.style.use(\"arviz-darkgrid\")\n", "plt.rcParams[\"figure.figsize\"] = [12, 7]\n", "plt.rcParams[\"figure.dpi\"] = 100\n", "plt.rcParams[\"figure.facecolor\"] = \"white\"\n", "\n", "\n", "%load_ext autoreload\n", "%autoreload 2\n", "%config InlineBackend.figure_format = \"retina\"" ] }, { "cell_type": "code", "execution_count": 81, "metadata": {}, "outputs": [], "source": [ "seed: int = sum(map(ord, \"mmm\"))\n", "rng: np.random.Generator = np.random.default_rng(seed=seed)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Loading Data\n", "Here we will load our geo level dataset. This will then be used within our Time-Slice CV steps." ] }, { "cell_type": "code", "execution_count": 207, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
| \n", " | date | \n", "y | \n", "x1 | \n", "x2 | \n", "event_1 | \n", "event_2 | \n", "dayofyear | \n", "t | \n", "geo | \n", "
|---|---|---|---|---|---|---|---|---|---|
| 0 | \n", "2018-04-02 | \n", "3984.662237 | \n", "159.290009 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "92 | \n", "0 | \n", "geo_a | \n", "
| 1 | \n", "2018-04-09 | \n", "3762.871794 | \n", "56.194238 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "99 | \n", "1 | \n", "geo_a | \n", "
| 2 | \n", "2018-04-16 | \n", "4466.967388 | \n", "146.200133 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "106 | \n", "2 | \n", "geo_a | \n", "
| 3 | \n", "2018-04-23 | \n", "3864.219373 | \n", "35.699276 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "113 | \n", "3 | \n", "geo_a | \n", "
| 4 | \n", "2018-04-30 | \n", "4441.625278 | \n", "193.372577 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "120 | \n", "4 | \n", "geo_a | \n", "