duplex {prospectr} | R Documentation |
Select calibration samples from a large multivariate data using the DUPLEX algorithm
duplex(X,k,metric,pc,group,.center = TRUE,.scale = FALSE)
X |
a |
k |
number of calibration/validation samples |
metric |
distance metric to be used: 'euclid' (Euclidean distance) or 'mahal' (Mahalanobis distance, default). |
pc |
optional. If not specified, distance are
computed in the Euclidean space. Alternatively, distance
are computed in the principal component score space and
|
group |
An optional |
.center |
logical value indicating whether the input matrix should be centered before Principal Component Analysis. Default set to TRUE. |
.scale |
logical value indicating whether the input matrix should be scaled before Principal Component Analysis. Default set to FALSE. |
The DUPLEX algorithm is similar to the Kennard-Stone
algorithm (see kenStone
) but allows to select
both calibration and validation points that are
independent. Similarly to the Kennard-Stone algorithm, it
starts by selecting the pair of points that are the
farthest apart. They are assigned to the calibration sets
and removed from the list of points. Then, the next pair of
points which are farthest apart are assigned to the
validation sets and removed from the list. In a third step,
the procedure assigns each remaining point alternatively to
the calibration and validation sets based on the distance
to the points already selected. Similarly to the
Kennard-Stone algorithm, the default distance metric used
by the procedure is the Euclidean distance, but the
Mahalanobis distance can be used as well using the
pc
argument (see kenStone
).
a list
with components:
'model
' numeric vector
giving the row
indices of the input data selected for calibration
'test
' numeric vector
giving the row
indices of the input data selected for validation
'pc
' if the pc
argument is specified,
a numeric matrix
of the scaled pc scores
Antoine Stevens & Leonardo Ramirez–Lopez
Kennard, R.W., and Stone, L.A., 1969. Computer aided design of experiments. Technometrics 11, 137-148.
Snee, R.D., 1977. Validation of regression models: methods and examples. Technometrics 19, 415-428.
kenStone
, honigs
,
shenkWest
, naes
data(NIRsoil) sel <- duplex(NIRsoil$spc,k=30,metric='mahal',pc=.99) plot(sel$pc[,1:2],xlab='PC1',ylab='PC2') points(sel$pc[sel$model,1:2],pch=19,col=2) # points selected for calibration points(sel$pc[sel$test,1:2],pch=18,col=3) # points selected for validation # Test on artificial data X <- expand.grid(1:20,1:20) + rnorm(1e5,0,.1) plot(X[,1],X[,2],xlab='VAR1',ylab='VAR2') sel <- duplex(X,k=25,metric='mahal') points(X[sel$model,],pch=19,col=2) # points selected for calibration points(X[sel$test,],pch=15,col=3) # points selected for validation