BuildModelFromDAG#
- class pymc_marketing.mmm.causal.BuildModelFromDAG(*, dag=FieldInfo(annotation=NoneType, required=True, description='DAG in DOT string format or A->B list'), df=FieldInfo(annotation=NoneType, required=True, description='DataFrame containing all DAG node columns'), target=FieldInfo(annotation=NoneType, required=True, description='Target node name present in DAG and df'), dims=FieldInfo(annotation=NoneType, required=True, description='Dims for observed/likelihood variables'), coords=FieldInfo(annotation=NoneType, required=True, description='Required coords mapping for dims and priors. All coord keys must exist as columns in df.'), model_config=FieldInfo(annotation=NoneType, required=False, default=None, description="Optional model config with Priors for 'intercept', 'slope' and 'likelihood'. Keys not supplied fall back to defaults."))[source]#
Build a PyMC probabilistic model directly from a Causal DAG and a tabular dataset.
The class interprets a Directed Acyclic Graph (DAG) where each node is a column in the provided
df. For every edgeA -> Bit creates a slope prior for the contribution ofAinto the mean ofB. Each node receives a likelihood prior. Dims and coords are used to align and index observed data viapm.Dataand xarray.- Parameters:
- dag
str DAG in DOT format (e.g.
digraph { A -> B; B -> C; }) or as a simple comma/newline separated list of edges (e.g."A->B, B->C").- df
pandas.DataFrame DataFrame that contains a column for every node present in the DAG and all columns named by the provided
dims.- target
str Name of the target node present in both the DAG and
df. This is not used to restrict modeling but is validated to exist in the DAG.- dims
tuple[str, …] Dims for the observed variables and likelihoods (e.g.
("date", "channel")).- coords
dict Mapping from dim names to coordinate values. All coord keys must exist as columns in
dfand will be used to pivot the data to match dims.- model_config
dict, optional Optional configuration with priors for keys
"intercept","slope"and"likelihood". Values should bepymc_extras.prior.Priorinstances. Missing keys fall back to :pyattr:`default_model_config`.
- dag
Examples
Minimal example using DOT format:
import numpy as np import pandas as pd from pymc_marketing.mmm.causal import BuildModelFromDAG dates = pd.date_range("2024-01-01", periods=5, freq="D") df = pd.DataFrame( { "date": dates, "X": np.random.normal(size=5), "Y": np.random.normal(size=5), } ) dag = "digraph { X -> Y; }" dims = ("date",) coords = {"date": dates} builder = BuildModelFromDAG( dag=dag, df=df, target="Y", dims=dims, coords=coords ) model = builder.build()
Edge-list format and custom likelihood prior:
from pymc_extras.prior import Prior dag = "X->Y" # equivalent to the DOT example above model_config = { "likelihood": Prior( "StudentT", nu=5, sigma=Prior("HalfNormal", sigma=1), dims=("date",) ), } builder = BuildModelFromDAG( dag=dag, df=df, target="Y", dims=("date",), coords={"date": dates}, model_config=model_config, ) model = builder.build()
Methods
BuildModelFromDAG.__init__(*[, dag, df, ...])Construct and return the PyMC model implied by the DAG and data.
Return a copy of the parsed DAG as a NetworkX directed graph.
Return a Graphviz visualization of the built PyMC model.
Attributes
default_model_configDefault priors for intercepts, slopes and likelihood using
pymc_extras.Prior.