rhive-apply {RHive} | R Documentation |
R Distributed apply function using HQL
rhive.napply(tableName, FUN, ...,forcedRef=TRUE) rhive.sapply(tableName, FUN, ..., forcedRef=TRUE) rhive.mrapply(tableName, mapperFUN, reducerFUN, mapInput=NULL, mapOutput=NULL, by=NULL, reduceInput=NULL,reduceOutput=NULL, mapperArgs=NULL, reducerArgs=NULL, bufferSize=-1L, verbose=FALSE, forcedRef=TRUE) rhive.mapapply(tableName, mapperFUN, mapInput=NULL, mapOutput=NULL, by=NULL, args=NULL, bufferSize=-1L, verbose=FALSE, forcedRef=TRUE) rhive.reduceapply(tableName, reducerFUN, reduceInput=NULL, reduceOutput=NULL, args=NULL, bufferSize=-1L, verbose=FALSE, forcedRef=TRUE)
tableName |
hive table name. |
FUN |
the function to be applied. |
... |
optional arguments to 'FUN'. |
mapperFUN |
a function which is executed on each worker node. The so-called mapper typically maps input key/value pairs to a set of intermediate key/value pairs. |
reducerFUN |
a function which is executed on each worker node. The so-called reducer reduces a set of intermediate values which share a key to a smaller set of values. If no reducer is used leave NULL. |
mapInput |
map-input column list. |
mapOutput |
map-output column list. |
by |
cluster key column |
reduceInput |
reduce-input column list. |
reduceOutput |
reduce-output column list. |
bufferSize |
streaming buffer size. |
verbose |
print generated HQL. |
args |
custom environment. |
mapperArgs |
mapper custom environment. |
reducerArgs |
reducer custom environment. |
forcedRef |
the option which forces to create temp-table for result. |
## try to connect hive server ## Not run: rhive.connect("hive-server-ip") ## invoke napply for numeric return type ## Not run: rhive.napply('emp', function(item) { item * 10 },'sal') ## End(Not run) ## invoke sapply for string return type ## Not run: rhive.napply('emp', function(item) { paste('NAME : ', item, sep='') }, 'ename') ## End(Not run) ## custom map/reduce script ## Not run: map <- function(k, v) { if(is.null(v)) { put(NA, 1) } lapply(v, function(vv) { lapply(strsplit(x = vv, split = "\t")[[1]], function(w) put(paste(args, w, sep = ""), 1)) }) } reduce <- function(k, vv) { put(k, sum(as.numeric(vv))) } rhive.mrapply("emp", map, reduce, c("ename", "position"), c("position", "one"), by="position", c("position", "one"), c("position", "count")) ## End(Not run) ## close connection ## Not run: rhive.close()