An Interactive Browser Tool for Causal DAG Construction and Analysis
DAG Studio is an open-access browser tool for constructing and analyzing directed acyclic graphs in causal inference research. Users draw causal structures interactively, and the tool automatically identifies variable roles, enumerates backdoor paths, and returns valid minimal adjustment sets based on Pearl's do-calculus. It targets researchers in population health and social science who need to reason carefully about confounding without specialized software.
Causal inference in observational research depends critically on correctly specifying the causal structure of a problem before choosing an estimator. A directed acyclic graph encodes assumptions about which variables confound, mediate, or act as colliders on the path between a treatment and outcome, and those assumptions directly determine what must be conditioned on for a valid causal estimate. In practice, researchers often reason about these structures informally or sketch them by hand, with no immediate feedback on whether the resulting adjustment strategy is valid.
Existing tools for DAG analysis, such as DAGitty, are functional but dated in interface design and limited in interactivity. DAG Studio was built to fill that gap: a fast, modern, browser-native tool that lets researchers build causal diagrams and receive real-time identification analysis without any installation or account. The target audience is researchers in population health and social science, fields where DAG-based reasoning has become standard practice but accessible tooling has lagged.
DAG Studio is a client-side web application with no backend. All graph construction, role classification, path enumeration, and adjustment set search run locally in the browser using custom TypeScript implementations of standard causal graph algorithms.
The unit of analysis is an arbitrary user-defined DAG. Users specify a treatment node and an outcome node, and the tool operates on that designated pair. The tool supports graphs of arbitrary size, though the adjustment set search is practically bounded to a configurable candidate limit of six variables. Nodes can be marked as unobserved (latent), in which case they are excluded from adjustment candidates. Five built-in templates cover the canonical structural cases: confounding, mediation, instrumental variable, collider, and front-door.
The identification engine implements two core algorithms. First, variable role classification uses ancestor and descendant traversal from the treatment and outcome nodes to assign each node a structural role: confounder, mediator, collider, or instrument. Second, adjustment set search uses brute-force enumeration over subsets of observed non-descendant candidates, testing each subset for d-separation of the treatment and outcome conditional on that set. D-separation is tested using the Bayes Ball algorithm.
Backdoor path detection identifies all undirected paths from treatment to outcome, classifies each as causal or non-causal, and flags whether each open backdoor path is blocked by the proposed adjustment set. The instrumental variable heuristic correctly excludes instrument-to-treatment edges from backdoor classification by testing whether the instrument can reach the outcome without passing through the treatment, using a modified reachability check.
Edge geometry is fully custom: attachment points are computed from node centers using circle-intersection math, with angle spreading to prevent overlapping edges at shared nodes and obstacle-routing for curved paths around intervening nodes.
The tool provides real-time causal analysis as the graph is edited. Key capabilities include:
The adjustment set search is exhaustive over subsets up to size six, which is sufficient for most applied DAGs but does not scale to large graphs. The tool does not implement the front-door criterion or instrumental variable identification formally; it identifies the structural role of potential instruments but does not compute IV estimates or test exclusion restrictions.
All identification results assume the DAG is correctly specified by the user, which is the fundamental limitation of any DAG-based analysis. The tool does not support cyclic graphs, time-varying treatments, or continuous-time causal structures.
DAG Studio provides a fast, modern interface for the kind of causal reasoning that population health researchers do routinely but often without computational support. By surfacing adjustment sets, backdoor paths, and variable roles in real time, it lowers the barrier to careful structural thinking earlier in the research design process.
Development is ongoing. Planned additions include formal front-door identification and export of adjustment sets directly to analysis code in R and Python.