From messy questions to what is knowable, identifiable, and actionable
Businesses need simple answers to complex questions. And increasingly, organizations do not have the time, expertise, or appetite to translate them into formal decision-theoretic problems. If statisticians insist on full procedural rigor before giving an answer, they will be bypassed.
A business user manager can now upload two datasets to an LLM and ask: “Is the new version better? Yes or no answers only please.” and receive a confident answer in 30 seconds, with no caveats, no discussion of assumptions or their validity, no clarification.
This is profoundly attractive. It reduces cognitive load and accelerates decisions.
The danger is not that these systems are foolish. The danger is that they compress ambiguity into certainty without enforcing design discipline. They give single-number answers to multi-dimensional problems. And if statisticians do not provide an alternative that is equally frictionless, the market will select for speed over correctness. If we fail to provide an alternative, we will drift into automated self-deception, where organizations will still automate decision-making—but without statistical discipline leading to confident but false conclusions
Here, we propose such an alternative.
Modern business analytics fails in a structural way:
This leads to a systemic failure mode:
Automated Self-Deception — confident answers produced without explicit accounting for identifiability, uncertainty, or causal structure.
Design a system that transforms:
messy data + vague business question
into:
structured causal inference + uncertainty + experiment design + human-readable decision support
with one key constraint:
The user never needs to see statistical complexity unless they choose to.
The system is built on four principles:
A global default assumption:
\[\theta \sim N(0, \tau^2), \quad \tau \text{ small}\]
Meaning:
Most interventions have negligible effect unless strong evidence suggests otherwise.
Every result must be classified as:
This replaces:
All analysis is grounded in:
explicit causal DAGs constructed from domain knowledge
not just correlations or regression formulas.
End users see:
They do NOT see:
User Question
↓
Natural Language Interpreter (LLM)
↓
Causal DAG Builder (domain + prior knowledge)
↓
Data Profiler + Metric Constructor
↓
Model Registry (plugin statistical methods)
↓
Model Ladder (Bayesian inference engine)
↓
Causal Estimation Layer (marginaleffects-style)
↓
Diagnostics + Identifiability Engine
↓
Experiment Design Engine
↓
Report Generator (business-facing language)
The system constructs a DAG using:
Example:
Treatment → Engagement → Revenue User Type → Engagement User Type → Retention Seasonality → Engagement
The DAG is used to:
The DAG determines:
Each statistical method is a declared plugin:
Required interface:
This allows:
logistic regression, Bayesian hierarchical models, survival models, etc. to coexist safely
Instead of one model, the system runs a sequence:
y ~ treatment + (1 | user)
y ~ treatment + (1 | user) + treatment:user
y ~ treatment + s(time) + (1 | user)
Key rule:
Models are not selected — they are tested for:
To make outputs usable for business users:
The system uses a post-model causal translation layer, inspired by tools like marginaleffects
Purpose:
Convert model outputs into:
interpretable causal effects under DAG-defined interventions
Example outputs:
Instead of:
β = 0.12 ± 0.04
The system outputs:
Key feature:
This layer:
After each model:
Explicit failure reporting:
Example:
“User-level time-varying effects are not identifiable due to confounding between seasonality and treatment exposure.”
The system can answer:
“Can we even learn this from data?”
Outputs:
Key output format:
Any statistical method can be added if it declares:
Example:
This makes the system:
a statistical operating system, not a fixed model pipeline
Every run produces a structured report:
The end user NEVER sees:
They see:
clear, decision-oriented statements with uncertainty and caveats
Built with:
MVP model:
log(y + 1) ~ treatment + (1 | user)
MVP output:
This system is:
A modular causal + Bayesian + plugin-based inference engine that translates messy business questions into formally grounded statements about what is identifiable, what is uncertain, and what actions are justified.
Not:
The system fundamentally shifts the question:
From:
“What is the answer?”
To:
“What is knowable from this data, under what assumptions, and what would we need to know more?”
This section illustrates the system in practice. The goal is to make clear how a user with a vague business question is transformed into:
“We launched a new version of our website (B). Is it better than the old version (A)?”
Data provided:
User intent is vague:
The system translates the question into:
Implicit estimands:
The system builds a causal model such as:
Treatment → Engagement → Conversion → Revenue User Type → Engagement User Type → Conversion Time → Engagement Time → Conversion
Key insight:
The system evaluates:
Model 1 (baseline)
conversion ~ treatment + (1 | user)
Result:
Model 2 (heterogeneity)
conversion ~ treatment + (1 | user) + treatment:user
Result:
Model 3 (time effects)
conversion ~ treatment + s(time) + (1 | user)
Result:
Using DAG-consistent adjustment + marginaleffects-style summaries:
The system flags:
The system provides:
Detectability analysis:
Recommendations:
Required additional data:
✔ What is supported by data
⚠ What is model-dependent
❌ What is not identifiable
📌 What would make this knowable
Version B is likely to improve short-term conversion and engagement. The probability that B outperforms A on conversion is 0.91. However, long-term retention effects cannot be determined from the current experiment duration. Additional data or extended observation would be required to evaluate long-term impact.
This system does NOT answer:
“Is B better than A?”
It answers:
“What effects can be reliably inferred from the data, under a causal model, and what additional information would resolve remaining uncertainty?”
A central component of the proposed system is a probabilistic transition graph over statistical models and their failure modes. This structure formalizes expert statistical practice—typically implicit, experience-based, and distributed across textbooks—into a machine-usable representation that supports automated model escalation, diagnosis, and robustness checking.
In this graph, each node represents a statistical model (e.g., Poisson regression, logistic regression, linear regression, negative binomial regression), and directed edges represent diagnosis-driven transitions between models. For example, an edge from Poisson regression to negative binomial regression is activated when diagnostics indicate overdispersion. Similarly, logistic regression may transition to penalized or Bayesian logistic regression under conditions of separation or instability. Unlike deterministic rule systems, these transitions are probabilistic, meaning each edge is associated with a weight representing the strength of evidence that the transition is appropriate given the detected failure mode.
These weights are not assumed to be fixed or universally correct. Instead, they are treated as learned or calibrated quantities, derived from a combination of sources: statistical literature (via structured extraction from textbooks and papers), expert heuristics, and—critically—empirical validation through simulation. In this sense, the graph is not merely a reflection of statistical theory, but an evolving object that integrates theory with observed model behaviour under controlled data-generating processes. This allows the system to refine its understanding of when particular statistical methods fail and which alternatives are most robust under specific conditions.
Operationally, the transition graph functions as the backbone of the system’s model laddering mechanism. When a model is fit to data, diagnostic checks are computed (e.g., dispersion statistics, convergence metrics, residual structure). If a failure mode is detected, the graph is queried to propose one or more alternative models, which are then evaluated in turn. This creates a closed-loop system of fit → diagnose → transition → refit, allowing the system to automatically adapt model complexity to the structure of the data without requiring explicit user intervention.
Overall, the probabilistic transition graph provides a principled way to encode statistical “know-how” into a computational structure. It bridges the gap between informal statistical reasoning in practice and formal automated inference systems, enabling robust, adaptive model selection that is both data-driven and informed by statistical theory.
This post was developed through iterative discussion with ChatGPT, which was used to help explore ideas, structure arguments, and draft and organize content. To paraphrase Borges: “I do not know which one of us has written this post”. But any flaws are the human’s responsibility.