ft_quantile_discretizer {sparklyr} | R Documentation |
Takes a column with continuous features and outputs a column with binned categorical features. The bin ranges are chosen by taking a sample of the data and dividing it into roughly equal parts. The lower and upper bin bounds will be -Infinity and +Infinity, covering all real values. This attempts to find numBuckets partitions based on a sample of the given input data, but it may find fewer depending on the data sample values.
ft_quantile_discretizer(x, input.col = NULL, output.col = NULL, n.buckets = 5L, ...)
x |
An object (usually a |
input.col |
The name of the input column(s). |
output.col |
The name of the output column. |
n.buckets |
The number of buckets to use. |
... |
Optional arguments; currently unused. |
Note that the result may be different every time you run it, since the sample strategy behind it is non-deterministic.
See http://spark.apache.org/docs/latest/ml-features.html for more information on the set of transformations available for DataFrame columns in Spark.
Other feature transformation routines: ft_binarizer
,
ft_bucketizer
,
ft_discrete_cosine_transform
,
ft_elementwise_product
,
ft_index_to_string
,
ft_one_hot_encoder
,
ft_regex_tokenizer
,
ft_sql_transformer
,
ft_string_indexer
,
ft_tokenizer
,
ft_vector_assembler
,
sdf_mutate