rhive.basic {RHive}R Documentation

R Distributed basic statistic function using Hive

Description

R Distributed basic statistic function using Hive

Usage

rhive.basic.mode(tableName, col, forcedRef=TRUE)
rhive.basic.range(tableName, col)
rhive.basic.merge(x, y, by.x, by.y, forcedRef=TRUE)
rhive.basic.xtabs(formula, tableName)
rhive.basic.cut(tableName, col, breaks, right=TRUE, summary=FALSE, 
  forcedRef=TRUE)
rhive.basic.cut2(tableName, col1, col2, breaks1, breaks2, right=TRUE,
  keepCol=FALSE, forcedRef=TRUE)
rhive.basic.by(tableName, INDICES, fun, arguments, forcedRef=TRUE)
rhive.basic.scale(tableName, col)
rhive.basic.t.test(x,col1,y,col2)
rhive.block.sample(tableName, percent=0.01, seed=0, subset) 

Arguments

tableName

hive table name.

x, y

table-names to be coerced to one or an object which can be coerced.

by.x, by.y

specifications of the common columns.

col

column name

col1

column name

col2

column name

formula

a formula object with the cross-classifying variables (separated by '+') on the right hand side (or an object which can be coerced to a formula).

breaks

a numeric vector of two or more cut points. a format is 'min:max:step' and 'step' is optional. or either a numeric vector of two or more cut points or a single number (greater than or equal to 2) giving the number of intervals into which 'x' is to be cut.

breaks1

a breaks of col1

breaks2

a breaks of col2

summary

a option whether summarize the result of cut or not.

INDICES

a list of column to be grouped.

fun

a hive function name to be applied.

arguments

input data for a function. for examples, arguments = c("sal", "deptno", 3.2, "'NexR'")

right

logical, indicating if the intervals should be closed on the right (and open on the left) or vice versa.

keepCol

an option which keeps original columns

forcedRef

the option which forces to create temp-table for result.

percent

percent of data size which is picked up.

seed

first selected block index.

subset

an optional record-set specifying a subset of observations to be used.

Author(s)

rhive@nexr.com

Examples

## try to connect hive server
## Not run: rhive.connect("hive-server-ip")

## find the most frequency data of specified column
## Not run: rhive.basic.mode('emp','deptno')

## calculate min,max of specified column
## Not run: rhive.basic.range('emp','sal')

## merge two tables using shared column
## Not run: rhive.basic.merge('emp','dept', by.x = 'deptno', by.y = 'id')

DF <- as.data.frame(UCBAdmissions)

## Not run: rhive.write.table(DF)

## Nice for taking margins ...
## Not run: rhive.basic.xtabs('freq', c('gender', 'admit'), 'df')

## divides the range of a column into intervals
## Not run: rhive.basic.cut('emp', 'sal', breaks='0:5000:100')

## divides the range of a column into intervals
## Not run: rhive.basic.cut2('emp', 'dept', 'sal', 'loc', breaks1='0:5000:100',
  breaks2='0:100:10')
## End(Not run)

## extract the summation of salary by group 
## Not run: rhive.basic.by('emp', 'deptno', 'sum', c("sal"))

## centers and/or scales the columns of table
## Not run: rhive.basic.scale('emp', 'sal')

## analyze two dataset 
## Not run: rhive.basic.t.test(emp$sal, emp$age)

## sampling
## Not run: rhive.basic.sample("emp", subset="id < 100")

## close connection
## Not run: rhive.close()

[Package RHive version 2.0-0.10 Index]