equijoin {rmr2}R Documentation

Equijoins using map reduce

Description

A generalized form of equijoin, hybrid between the SQL brethren and mapreduce

Usage

equijoin(
  left.input = NULL, 
  right.input = NULL, 
  input = NULL, 
  output = NULL, 
  input.format = "native",
  output.format = "native",
  outer = c("", "left", "right", "full"), 
  map.left = to.map(identity), 
  map.right = to.map(identity), 
  reduce  = reduce.default)

Arguments

left.input

The left side input to the join.

right.input

The right side input to the join.

input

The only input in case of a self join. Mutually exclusive with the previous two.

output

Where to write the output.

input.format

Input format specification, see make.input.format

output.format

Output format specification, see make.output.format

outer

Whether to perform an outer join, one of the usual three types, left, right or full.

map.left

Function to apply to each record from the left input, follows same conventions as any map function. The returned keys will become join keys.

map.right

Function to apply to each record from the right input, follows same conventions as any map function. The returned keys will become join keys.

reduce

Function to be applied, key by key, on the values associated with that key. Those values are in the arguments vl (left side) and vr (right side) and their type is determined by the type returned by the map functions, separately for the left side and the right side. The allowable return values are like those of any reduce function, see mapreduce. The default performs a merge with by = NULL which performs a cartesian product, unless lists are involved in which case the arguments are simply returned in a list.

Value

If output is specified, returns output itself. Otherwise, a big.data.object

Warning

Doesn't work with multiple inputs like mapreduce

Examples

##---- Should be DIRECTLY executable !! ----
##-- ==>  Define data, use random,
##--	or do  help(data=index)  for the standard data sets.
from.dfs(equijoin(left.input = to.dfs(keyval(1:10, 1:10^2)), right.input = to.dfs(keyval(1:10, 1:10^3))))

[Package rmr2 version 3.3.1 Index]