rlda.binomial {Rlda} | R Documentation |
This method implements the Latent Dirichlet Allocation with
Stick-Breaking prior for binomial data.
rlda.binomial
works with frequency data.frame and also a
population data.frame.
rlda.binomial(data, pop, n_community, alpha0, alpha1, gamma, n_gibbs, ll_prior = TRUE, display_progress = TRUE)
data |
A abundance data.frame where each row is a sampling unit (i.e. Plots, Locations, Time, etc.) and each column is a categorical type of element (i.e. Species, Firms, Issues, etc.). |
pop |
A population data.frame where each row is a sampling unit
(i.e. Plots, Locations, Time, etc.) and each column is a categorical
type of element (i.e. Species, Firms, Issues, etc.). The elements inside
this data.frame must all be greater than the elements inside the |
n_community |
Total number of communities to return. It must be less than
the total number of columns inside the |
alpha0 |
Hyperparameter associated with the Beta prior Beta(alpha0, alpha1). |
alpha1 |
Hyperparameter associated with the Beta prior Beta(alpha0, alpha1). |
gamma |
Hyperparameter associated with the Stick-Breaking prior. |
n_gibbs |
Total number of Gibbs Samples. |
ll_prior |
boolean scalar, |
display_progress |
boolean scalar, |
rlda.binomial
uses a modified Latent Dirichlet Allocation method
to construct Mixed-Membership Clusters using Bayesian Inference.
The data
must be a non-empty data.frame with the frequencies for each variable
(column) in each observation (row). The pop
must be a non-empty data.frame with
the frequencies for each variable (column) in each observation (row) greater than the
entries inside data
data.frame.
A R List with three elements:
Theta |
The individual probability for each observation
(ex: location) belong in each cluster (ex: community). It is a matrix
with dimension equal |
Phi |
The individual probability for each variable
(ex: Specie) belong in each cluster (ex: community). It is a matrix
with dimension equal |
LogLikelihood |
The vector of Log-Likelihoods compute for each Gibbs Sample. |
The Theta
and Phi
matrix can be obtained for the i-th gibbs
sampling using matrix(Theta[i,], nrow = nrow(data), ncol = n_community)
and
matrix(Phi[i,], nrow = n_community, ncol = ncol(data))
, respectively.
Pedro Albuquerque.
pedroa@unb.br
http://pedrounb.blogspot.com/
Denis Valle.
drvalle@ufl.edu
http://denisvalle.weebly.com/
Daijiang Li.
daijianglee@gmail.com
http://daijiang.name
Blei, David M., Andrew Y. Ng, and Michael I. Jordan.
"Latent dirichlet allocation." Journal of machine Learning research
3.Jan (2003): 993-1022.
http://www.jmlr.org/papers/volume3/blei03a/blei03a.pdf
Valle, Denis, et al.
"Decomposing biodiversity data using the Latent Dirichlet
Allocation model, a probabilistic multivariate statistical
method." Ecology letters 17.12 (2014): 1591-1601.
rlda.multinomial
, rlda.bernoulli
## Not run: library(Rlda) # Read the SP500 data data(sp500) # Create size spSize <- as.data.frame(matrix(100, ncol = ncol(sp500), nrow = nrow(sp500))) # Set seed set.seed(5874) # Hyperparameters for each prior distribution gamma <- 0.01 alpha0 <- 0.01 alpha1 <- 0.01 # Execute the LDA for the Binomial entry res <- rlda.binomial(data = sp500, pop = spSize, n_community = 10, alpha0 = alpha0, alpha1 = alpha1, gamma = gamma, n_gibbs = 500, ll_prior = TRUE, display_progress = TRUE) ## End(Not run)