The R
package BDgraph provides statistical tools for Bayesian structure learning for undirected graphical models with continuous, count, binary, and mixed data. The package is implemented the recent improvements in the Bayesian graphical models’ literature, including Mohammadi and Wit (2015), Mohammadi et al. (2021), Mohammadi et al. (2017), and Dobra and Mohammadi (2018). Besides, the package contains several functions for simulation and visualization, as well as several multivariate datasets taken from the literature.
Install BDgraph using
install.packages( "BDgraph" )
First, we install BDgraph package as well as pROC and ggplot2
library( BDgraph )
library( pROC )
library( ggplot2 )
Here are two simple examples to show how to use the functionality of the package.
Here is a simple example to see the performance of the package for the Gaussian graphical models. First, by using the function bdgraph.sim()
, we simulate 200 observations (n = 200) from a multivariate Gaussian distribution with 15 variables (p = 15) and “scale-free” graph structure, as follows
set.seed( 20 )
= bdgraph.sim( n = 200, p = 15, graph = "scale-free", vis = TRUE ) data.sim
Since the generated data are Gaussian, we run the bdgraph()
function by choosing method = "ggm"
, as follows
= bdgraph( data = data.sim, method = "ggm", iter = 5000 )
bdgraph.obj > This OS does not support multi-threading for the BDgraph package
> 5000 MCMC sampling ... in progress:
> 5%->10%->15%->20%->25%->30%->35%->40%->45%->50%->55%->60%->65%->70%->75%->80%->85%->90%->95%-> done
To report confusion matrix with cutoff point 0.5:
conf.mat( actual = data.sim, pred = bdgraph.obj, cutoff = 0.5 )
> Actual
> Prediction 0 1
> 0 89 4
> 1 2 10
conf.mat.plot( actual = data.sim, pred = bdgraph.obj, cutoff = 0.5 )
To compare the result with the true graph
compare( data.sim, bdgraph.obj, main = c( "Target", "BDgraph" ), vis = TRUE )
> Target BDgraph
> true positive 14 10.000
> true negative 91 89.000
> false positive 0 2.000
> false negative 0 4.000
> F1-score 1 0.769
> specificity 1 0.978
> sensitivity 1 0.714
> MCC 1 0.740
Now, as an alternative, we run the bdgraph.mpl()
function which is based on the GGMs and marginal pseudo-likelihood, as follows
= bdgraph.mpl( data = data.sim, method = "ggm", iter = 5000 )
bdgraph.mpl.obj > This OS does not support multi-threading for the BDgraph package
> 5000 MCMC sampling ... in progress:
> 5%->10%->15%->20%->25%->30%->35%->40%->45%->50%->55%->60%->65%->70%->75%->80%->85%->90%->95%-> done
conf.mat( actual = data.sim, pred = bdgraph.mpl.obj )
> Actual
> Prediction 0 1
> 0 89 4
> 1 2 10
conf.mat.plot( actual = data.sim, pred = bdgraph.mpl.obj )
We could compare the results of both algorithms with the true graph as follows
compare( data.sim, bdgraph.obj, bdgraph.mpl.obj,
main = c( "Target", "BDgraph", "BDgraph.mpl" ), vis = TRUE )
> Target BDgraph BDgraph.mpl
> true positive 14 10.000 10.000
> true negative 91 89.000 89.000
> false positive 0 2.000 2.000
> false negative 0 4.000 4.000
> F1-score 1 0.769 0.769
> specificity 1 0.978 0.978
> sensitivity 1 0.714 0.714
> MCC 1 0.740 0.740
To see the performance of the BDMCMC algorithm we could plot the ROC curve as follows
= BDgraph::roc( pred = bdgraph.obj, actual = data.sim )
roc.bdgraph = BDgraph::roc( pred = bdgraph.mpl.obj, actual = data.sim )
roc.bdgraph.mpl
::ggroc( list( BDgraph = roc.bdgraph, BDgraph.mpl = roc.bdgraph.mpl ), size = 0.8 ) +
pROCtheme_minimal() + ggtitle( "ROC plots with AUC" ) +
scale_color_manual( values = c( "red", "blue" ),
labels = c( paste( "AUC=", round( auc( roc.bdgraph ), 3 ), "; BDgraph; " ),
paste( "AUC=", round( auc( roc.bdgraph.mpl ), 3 ), "; BDgraph.mpl" ) ) ) +
theme( legend.title = element_blank() ) +
theme( legend.position = c( .7, .3 ), text = element_text( size = 17 ) ) +
geom_segment( aes( x = 1, xend = 0, y = 0, yend = 1 ), color = "grey", linetype = "dashed" )
Here is a simple example to see the performance of the package for the mixed data using Gaussian copula graphical models. First, by using the function bdgraph.sim()
, we simulate 300 observations (n = 300) from mixed data (type = "mixed"
) with 10 variables (p = 10) and “random” graph structure, as follows
set.seed( 2 )
= bdgraph.sim( n = 300, p = 10, type = "mixed", graph = "random", vis = TRUE ) data.sim
Since the generated data are mixed data, we are using run the bdgraph()
function by choosing method = "gcgm"
, as follows:
= bdgraph( data = data.sim, method = "gcgm", iter = 5000 )
bdgraph.obj > This OS does not support multi-threading for the BDgraph package
> 5000 MCMC sampling ... in progress:
> 5%->10%->15%->20%->25%->30%->35%->40%->45%->50%->55%->60%->65%->70%->75%->80%->85%->90%->95%-> done
To compare the result with the true graph, we could run
compare( data.sim, bdgraph.obj, main = c( "Target", "BDgraph" ), vis = TRUE )
> Target BDgraph
> true positive 12 9.000
> true negative 33 29.000
> false positive 0 4.000
> false negative 0 3.000
> F1-score 1 0.720
> specificity 1 0.879
> sensitivity 1 0.750
> MCC 1 0.613
For more examples see Mohammadi and Wit (2019).