rmr.sample {rmr2}R Documentation

Sample large data sets

Description

Sample large data sets

Usage

rmr.sample(input, output = NULL, method = c("any", "Bernoulli"), ...)

Arguments

input

The data set to be sampled as a file path or mapreduce return value

output

Where to store the result. See mapreduce, output argument, for details

method

One of "any" or "Bernoulli". "any" will return some records out, optimized for speed, but with no statistical guarantees. "Bernoulli" implements independent sampling according to the Bernoulli distribution

...

Additional arguments to fully specify the sample, they depend on the method selected. If it is "any" then the size of the desired sample should be provided as the argument n. If it is "Bernoulli" the argument p specifies the probabity of picking each record

Value

The sampled data. See mapreduce for details.


[Package rmr2 version 3.3.1 Index]