get_derivation {dosearch}R Documentation

Identify a causal effect from arbitrary experiments and observations

Description

Identify a causal query from available data in a causal model described by a semi-Markovian graph. Special mechanisms related to transportability of causal effects, recoverability from selection bias and identifiability under missing data can also be included in the model.

Usage

get_derivation(data, query, graph, transportability = NULL, selection_bias = NULL,
  missing_data = NULL, control = list())

Arguments

data

a character string describing the available distributions in the package syntax. See ‘Details’

query

a character string describing the target distribution in the package syntax. See ‘Details’

graph

a character string describing the graph in the package syntax. See ‘Details’

transportability

a character string describing the transportability nodes of the model in the package syntax. See ‘Details’

selection_bias

a character string describing the selection bias nodes of the model in the package syntax. See ‘Details’

missing_data

a character string describing the missing data mechanisms of the model in the package syntax. See ‘Details’

control

a list of control parameters. See ‘Details’.

Details

data is used to list the available input distributions of the form

P(Ai|do(Bi),Ci)

Individual variables within sets should be separated by a comma. For example, three input distributions

P(Z|do(X)), P(W,Y|do(Z,X)), P(W,Y,X|Z)

should be given as follows:

> data <- "
+  P(Z|do(X))
+  P(W,Y|do(Z,X))
+  P(W,Y,X|Z)
+"

The use of multiple do-operators is not permitted. Furthermore, when both conditioning variables and a do-operator are present, every conditioning variable must either precede the do-operator or follow it.

query is the target distribution of the search. It has the same syntax as data, but only a single distribution should be given.

graph is a description of a directed acyclic graph where directed edges are denoted by -> and bidirected arcs corresponding to unobserved confounders are denoted by --. As an example a graph with two directed edges and one bidirected arc is constructed as follows:

> graph <- "
+  X -> Z
+  Z -> Y
+  X -- Y
+"

transportability enumerates the nodes that should be understood as transportability nodes responsible for discrepancies between domains. Individual variables should be separated by a comma. See e.g., Bareinboim and Pearl (2014) for details on transportability.

selection_bias enumerates the nodes that should be understood as selection bias nodes responsible for bias in the input data sets. Individual variables should be separated by a comma. See e.g., Bareinboim and Pearl (2014) for details on selection bias recoverability.

missing_data enumerates the missingness mechanisms of the model. The syntax for a single mechanism is M_X : X where MX is the mechanism for X. Individual mechanisms should be separated by a comma. Note that both MX and X must be present in the graph if the corresponding mechanism is given as input. Proxy variables should not be included in the graph, since they are automatically generated based on missing_data. By default, a warning is issued if a proxy variable is present in an input distribution but its corresponding mechanism is not present in any input. See e.g., Mohan, Pearl and Tian (2013) for details on missing data as a causal inference problem.

The control argument is a list that can supply any of the following components:

benchmark

A logical value. If TRUE, the search time is recorded and returned (in milliseconds). Defaults to FALSE.

draw_derivation

A logical value. If TRUE, a string representing the derivation steps as a DOT graph is returned. The graph can be exported as an image for example by using dot. Defaults to FALSE.

draw_all

A logical value. If TRUE and if draw_derivation = TRUE, the derivation will contain every step taken by the search. If FALSE, only steps that resulted in an identifiable target are returned. Defaults to FALSE.

formula

A logical value. If TRUE, a string representing the identifiable query is returned. If FALSE, only a logical value is returned that takes the value TRUE for an identifiable target and FALSE otherwise. Defaults to TRUE.

heuristic

A logical value. If TRUE, new distributions are expanded according to a search heuristic (see the vignette for details). Otherwise, distributions are expanded in the order in which they were identified. Defaults to TRUE unless missing data mechanisms are provided in missing_data.

improve

A logical value. If TRUE, various improvements are applied to the search to make it more efficient (see the vignette for details). Defaults to TRUE.

md_sym

A single character describing the symbol to use for active missing data mechanisms. Defaults to "1".

replace

A logical value. If TRUE, the search will continue deriving new distributions after the target has been reached in order to possibly find a shorter search path. Defaults to FALSE.

rules

A numeric vector describing a subset of the search rules to be used. Must be a subset of c(-1,-2,-3,1,2,3,4,5,-6,6,-7,7,8,9) (see the vignette for details).

verbose

A logical value. If TRUE, diagnostic information is printed to the console during the search. Defaults to FALSE.

warn

A logical value. If TRUE, a warning is issued for possibly unintentionally misspecified but syntactically correct input distributions.

Value

A list with the following components by default. See the options of control for how to obtain a graphical representation of the derivation or how to benchmark the search.

identifiable

A logical value that attains the value TRUE is the target quantity is identifiable and FALSE otherwise.

formula

A character string describing a formula for an identifiable query or an empty character vector for an unidentifiable effect.

Author(s)

Santtu Tikka

Examples


# Multiple input distributions (both observational and interventional)

data1 <- "
  p(z_2,x_2|do(x_1))
  p(z_1|x_2,do(x_1,y))
  p(x_1|w_1,do(x_2))
  p(y|z_1,z_2,x_1,do(x_2))
  p(w|y,x_1,do(x_2))
"

query1 <- "p(y,x_1|w,do(x_2))"

graph1 <- "
  x_1 -> z_2
  x_1 -> z_1
  x_2 -> z_1
  x_2 -> z_2
  z_1 -> y
  z_2 -> y
  x_1 -> w
  x_2 -> w
  z_1 -> w
  z_2 -> w
"

get_derivation(data1, query1, graph1)

# Selection bias

data2 <- "
  p(x,y,z_1,z_2|s)
  p(z_1,z_2)
"

query2 <- "p(y|do(x))"

graph2 <- "
  x   -> z_1
  z_1 -> z_2
  x   -> y
  y   -- z_2
  z_2 -> s
"

get_derivation(data2, query2, graph2, selection_bias = "s")

# Transportability

data3 <- "
  p(x,y,z_1,z_2)
  p(x,y,z_1|s_1,s_2,do(z_2))
  p(x,y,z_2|s_3,do(z_1))
"

query3 <- "p(y|do(x))"

graph3 <- "
  z_1 -> x
  x   -> z_2
  z_2 -> y
  z_1 -- x
  z_1 -- z_2
  z_1 -- y
  t_1 -> z_1
  t_2 -> z_2
  t_3 -> y
"

get_derivation(data3, query3, graph3, transportability = "t_1, t_2, t_3")

# Missing data

data4 <- "
  p(x*,y*,z*,m_x,m_y,m_z)
"

query4 <- "p(x,y,z)"

graph4 <- "
  z -> x
  x -> y
  x -> m_z
  y -> m_z
  y -> m_x
  z -- y
"

get_derivation(data4, query4, graph4, missing_data = "m_x : x, m_y : y, m_z : z")

# Export the DOT diagram of the derivation as an SVG file
# to the working directory via the DOT package
# PostScript format is also supported

## Not run: 
d <- get_derivation(data1, query1, graph1, control = list(draw_derivation = TRUE))
DOT::dot(d$derivation, "derivation.svg")

## End(Not run)


[Package dosearch version 1.0.2 Index]