information_gain {FSelectorRcpp} | R Documentation |
Algorithms that find ranks of importance of discrete attributes, basing on their entropy with a continous class attribute. This function is a reimplementation of FSelector's information.gain, gain.ratio and symmetrical.uncertainty.
information_gain(formula, data, x, y, type = c("infogain", "gainratio", "symuncert"), equal = FALSE, discIntegers = TRUE, threads = 1)
formula |
An object of class formula with model description. |
data |
A data.frame accompanying formula. |
x |
A data.frame or sparse matrix with attributes. |
y |
A vector with response variable. |
type |
Method name. |
equal |
A logical. Whether to discretize dependent variable with the
|
discIntegers |
logical value. If true (default), then integers are treated as numeric vectors and they are discretized. If false integers are treated as factors and they are left as is. |
threads |
Number of threads for parallel backend. |
type = "infogain"
is
H(Class) + H(Attribute) - H(Class, Attribute)
type = "gainratio"
is
(H(Class) + H(Attribute) - H(Class, Attribute)) / H(Attribute)
type = "symuncert"
is
2 * (H(Class) + H(Attribute) - H(Class, Attribute)) / (H(Attribute) + H(Class))
where H(X) is Shannon's Entropy for a variable X and H(X, Y) is a conditional Shannon's Entropy for a variable X with a condition to Y.
data.frame with the following columns:
attributes - variables names.
importance - worth of the attributes.
Zygmunt Zawadzki zygmunt@zstat.pl
irisX <- iris[-5] y <- iris$Species ## data.frame interface information_gain(x = irisX, y = y) # formula interface information_gain(formula = Species ~ ., data = iris) information_gain(formula = Species ~ ., data = iris, type = "gainratio") information_gain(formula = Species ~ ., data = iris, type = "symuncert") # sparse matrix interface library(Matrix) i <- c(1, 3:8); j <- c(2, 9, 6:10); x <- 7 * (1:7) x <- sparseMatrix(i, j, x = x) y <- c(1, 1, 1, 1, 2, 2, 2, 2) information_gain(x = x, y = y) information_gain(x = x, y = y, type = "gainratio") information_gain(x = x, y = y, type = "symuncert")