nroPair {Numero}R Documentation

Match similar rows

Description

Pair up closest matching rows from two datasets

Usage

nroPair(data.x, data.y)

Arguments

data.x

A matrix or a data frame with column names.

data.y

A matrix or a data frame with column names.

Details

The function detects columns that are shared between the two datasets by their names. Pairs of rows across datasets are then compared using Euclidean distance to determine the best matches.

Value

A data frame that has up to five columns: ROW.x and ROW.y contain the pairings using row indices and DISTANCE contains the distances in data space. If row names are available, the columns ROWNAME.x and ROWNAME.y are added.

The output is sorted according to the matching distance.

Author(s)

Ville-Petteri Makinen

Examples

# Import data.
fname <- system.file("extdata", "finndiane.txt", package = "Numero")
dataset <- read.delim(file = fname)

# Set row names.
rownames(dataset) <- paste("r", 1:nrow(dataset), sep="")

# Prepare training data.
trvars <- c("CHOL", "HDL2C", "TG", "CREAT", "uALB")
trdata <- scale.default(dataset[,trvars])

# Split by sex.
women <- which(dataset$MALE == 0)
men <- which(dataset$MALE == 1)

# Find best matches.
pairs <- nroPair(data.x = trdata[women,], data.y = trdata[men,])
print(head(pairs))

[Package Numero version 1.2.0 Index]