discretize {FSelectorRcpp} | R Documentation |
Discretize a range of numeric attributes in the dataset into nominal
attributes. Minimum Description Length
(MDL) method is set as the default
control. There is also available equalsizeControl
method.
discretize(x, y, control = list(mdlControl(), equalsizeControl()), all = TRUE, discIntegers = TRUE, call = NULL) mdlControl() equalsizeControl(k = 10) customBreaksControl(breaks)
x |
Explanatory continuous variables to be discretized or a formula. |
y |
Dependent variable for supervised discretization or a data.frame when |
control |
|
all |
Logical indicating if a returned data.frame should contain other features that were not discretized.
(Example: should |
discIntegers |
logical value. If true (default), then integers are treated as numeric vectors and they are discretized. If false integers are treated as factors and they are left as is. |
call |
Keep as |
k |
Number of partitions. |
breaks |
custom breaks used for partitioning. |
Zygmunt Zawadzki zygmunt@zstat.pl
U. M. Fayyad and K. B. Irani. Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning. In 13th International Joint Conference on Uncertainly in Artificial Intelligence(IJCAI93), pages 1022-1029, 1993.
# vectors discretize(x = iris[[1]], y = iris[[5]]) # list and vector head(discretize(x = list(iris[[1]], iris$Sepal.Width), y = iris$Species)) # formula input head(discretize(x = Species ~ ., y = iris)) head(discretize(Species ~ ., iris)) # use different methods for specific columns ir1 <- discretize(Species ~ Sepal.Length, iris) ir2 <- discretize(Species ~ Sepal.Width, ir1, control = equalsizeControl(3)) ir3 <- discretize(Species ~ Petal.Length, ir2, control = equalsizeControl(5)) head(ir3) # custom breaks ir <- discretize(Species ~ Sepal.Length, iris, control = customBreaksControl(breaks = c(0, 2, 5, 7.5, 10))) head(ir) ## Not run: # Same results library(RWeka) Rweka_disc_out <- RWeka::Discretize(Species ~ Sepal.Length, iris)[, 1] FSelectorRcpp_disc_out <- FSelectorRcpp::discretize(Species ~ Sepal.Length, iris)[, 1] table(Rweka_disc_out, FSelectorRcpp_disc_out) # But faster method library(microbenchmark) microbenchmark(FSelectorRcpp::discretize(Species ~ Sepal.Length, iris), RWeka::Discretize(Species ~ Sepal.Length, iris)) ## End(Not run)