discretize {recipes} | R Documentation |
discretize
converts a numeric vector into a factor with bins having
approximately the same number of data points (based on a training set).
discretize(x, ...) ## Default S3 method: discretize(x, ...) ## S3 method for class 'numeric' discretize(x, cuts = 4, labels = NULL, prefix = "bin", keep_na = TRUE, infs = TRUE, min_unique = 10, ...) ## S3 method for class 'discretize' predict(object, newdata, ...) step_discretize(recipe, ..., role = NA, trained = FALSE, objects = NULL, options = list())
x |
A numeric vector |
... |
For |
cuts |
An integer defining how many cuts to make of the data. |
labels |
A character vector defining the factor levels that will be in
the new factor (from smallest to largest). This should have length
|
prefix |
A single parameter value to be used as a prefix for the factor
levels (e.g. |
keep_na |
A logical for whether a factor level should be created to
identify missing values in |
infs |
A logical indicating whether the smallest and largest cut point should be infinite. |
min_unique |
An integer defining a sample size line of dignity for the
binning. If (the number of unique values) |
object |
An object of class |
newdata |
A new numeric object to be binned. |
recipe |
A recipe object. The step will be added to the sequence of operations for this recipe. |
role |
Not used by this step since no new variables are created. |
trained |
A logical to indicate if the quantities for preprocessing have been estimated. |
objects |
The |
options |
A list of options to |
discretize
estimates the cut points from x
using
percentiles. For example, if cuts = 3
, the function estimates the
quartiles of x
and uses these as the cut points. If cuts = 2
,
the bins are defined as being above or below the median of x
.
The predict
method can then be used to turn numeric vectors into
factor vectors.
If keep_na = TRUE
, a suffix of "_missing" is used as a factor level
(see the examples below).
If infs = FALSE
and a new value is greater than the largest value of
x
, a missing value will result.
discretize
returns an object of class discretize
.
predict.discretize
returns a factor vector.
data(biomass) biomass_tr <- biomass[biomass$dataset == "Training",] biomass_te <- biomass[biomass$dataset == "Testing",] median(biomass_tr$carbon) discretize(biomass_tr$carbon, cuts = 2) discretize(biomass_tr$carbon, cuts = 2, infs = FALSE) discretize(biomass_tr$carbon, cuts = 2, infs = FALSE, keep_na = FALSE) discretize(biomass_tr$carbon, cuts = 2, prefix = "maybe a bad idea to bin") carbon_binned <- discretize(biomass_tr$carbon) table(predict(carbon_binned, biomass_tr$carbon)) carbon_no_infs <- discretize(biomass_tr$carbon, infs = FALSE) predict(carbon_no_infs, c(50, 100)) rec <- recipe(HHV ~ carbon + hydrogen + oxygen + nitrogen + sulfur, data = biomass_tr) rec <- rec %>% step_discretize(carbon, hydrogen) rec <- prep(rec, biomass_tr) binned_te <- bake(rec, biomass_te) table(binned_te$carbon)