rlda.multinomial {Rlda} | R Documentation |
This method implements the Latent Dirichlet Allocation with
Stick-Breaking prior for multinomial data.
rlda.multinomial
works with frequency data.frame.
rlda.multinomial(data, n_community, beta, gamma, n_gibbs, ll_prior = TRUE, display_progress = TRUE)
data |
A abundance data.frame where each row is a sampling unit (i.e. Plots, Locations, Time, etc.) and each column is a categorical type of element (i.e. Species, Firms, Issues, etc.). |
n_community |
Total number of communities to return. It must be less than
the total number of columns inside the |
beta |
Hyperparameter associated with the Dirichlet |
gamma |
Hyperparameter associated with the Stick-Breaking prior. |
n_gibbs |
Total number of Gibbs Samples. |
ll_prior |
boolean scalar, |
display_progress |
boolean scalar, |
rlda.multinomial
uses a modified Latent Dirichlet Allocation method
to construct Mixed-Membership Clusters using Bayesian Inference.
The data
must be a non-empty data.frame with the frequencies for each variable
(column) in each observation (row).
A R List with three elements:
Theta |
The individual probability for each observation
(ex: location) belong in each cluster (ex: community). It is a matrix
with dimension equal |
Phi |
The individual probability for each variable
(ex: Specie) belong in each cluster (ex: community). It is a matrix
with dimension equal |
LogLikelihood |
The vector of Log-Likelihoods compute for each Gibbs Sample. |
The Theta
and Phi
matrix can be obtained for the i-th gibbs
sampling using matrix(Theta[i,], nrow = nrow(data), ncol = n_community)
and
matrix(Phi[i,], nrow = n_community, ncol = ncol(data))
, respectively.
Pedro Albuquerque.
pedroa@unb.br
http://pedrounb.blogspot.com/
Denis Valle.
drvalle@ufl.edu
http://denisvalle.weebly.com/
Daijiang Li.
daijianglee@gmail.com
Blei, David M., Andrew Y. Ng, and Michael I. Jordan.
"Latent dirichlet allocation." Journal of machine Learning research
3.Jan (2003): 993-1022.
http://www.jmlr.org/papers/volume3/blei03a/blei03a.pdf
Valle, Denis, et al.
"Decomposing biodiversity data using the Latent Dirichlet
Allocation model, a probabilistic multivariate statistical
method." Ecology letters 17.12 (2014): 1591-1601.
## Not run: # Invoke the library library(Rlda) # Read the Complaints data data(complaints) # Create the abundance matrix library(reshape2) mat1 <- dcast(complaints[, c("Company","Issue")], Company ~ Issue, fun.aggregate = length, value.var = "Issue") # Create the rowname rownames(mat1) <- mat1[, 1] # Remove the ID variable mat1 <- mat1[, -1] # Set seed set.seed(9292) # Hyperparameters for each prior distribution beta <- rep(1,ncol(mat1)) gamma <- 0.01 #Execute the LDA for the Multinomial entry res <- rlda.multinomial(data = mat1, n_community = 30, beta = beta, gamma = gamma, n_gibbs = 1000, ll_prior = TRUE, display_progress = TRUE) ## End(Not run)