plotMS {pedometrics} | R Documentation |
This function produces a graphical output that allows the examination of the
effect of using different model specifications (design) on the predictive
performance of these models (a model series). It generally is used to access
the results of functions buildMS
and statsMS
, but can be
easily adapted to work with any model structure and performance measure.
plotMS(obj, grid, line, ind, type = c("b", "g"), pch = c(20, 2), size = 0.5, arrange = "desc", color = NULL, xlim = NULL, ylab = NULL, xlab = NULL, at = NULL, ...)
obj |
Object of class |
grid |
Vector of integer values or character strings indicating the
columns of the |
line |
Character string or integer value indicating which of the
performance statistics (usually calculated by |
ind |
Integer value indicating for which group of models the mean rank is to be calculated. See ‘Details’ for more information. |
type |
Vector of character strings indicating some of the effects to be
used when plotting the performance statistics using |
pch |
Vector with two integer values specifying the symbols to be used
to plot points. The first sets the symbol used to plot the performance
statistic, while the second sets the symbol used to plot the mean rank of
the indicator set using argument |
size |
Numeric value specifying the size of the symbols used for
plotting the mean rank of the indicator set using argument |
arrange |
Character string indicating how the model series should be
arranged, which can be in ascending ( |
color |
Vector defining the colors to be used in the grid produced by
function |
xlim |
Numeric vector of length 2, giving the x coordinates range. If
|
ylab |
Character vector of length 2, giving the y-axis labels. When
|
xlab |
Character vector of length 1, giving the x-axis labels. Defaults
to |
at |
Numeric vector indicating the location of tick marks along the x axis (in native coordinates). |
... |
Other arguments for plotting, although most of these have no been
tested. Argument |
This section gives more details about arguments obj
, grid
,
line
, arrange
, and ind
.
The argument obj
usually constitutes a data.frame
returned by
statsMS
. However, the user can use any data.frame
object as far
as it contains the two basic units of information needed:
design data passed with argument grid
performance statistic passed with argument line
The argument grid
indicates the design data which is used to
produce the grid output in the top of the model series plot. By design
we mean the data that specify the structure of each model and how they differ
from each other. Suppose that eight linear models were fit using three types
of predictor variables (a
, b
, and c
). Each of these
predictor variables is available in two versions that differ by their
accuracy, where 0
means a less accurate predictor variable, while
1
means a more accurate predictor variable. This yields 2^3 = 8 total
possible combinations. The design data would be of the following form:
> design
a b c
1 0 0 0
2 0 0 1
3 0 1 0
4 1 0 0
5 0 1 1
6 1 0 1
7 1 1 0
8 1 1 1
The argument line
corresponds to the performance statistic that is
used to arrange the models in ascending or descending order, and to produce
the line output in the bottom of the model series plot. For example, it can
be a series of values of adjusted coefficient of determination, one for each
model:
adj_r2 <- c(0.87, 0.74, 0.81, 0.85, 0.54, 0.86, 0.90, 0.89)
The argument arrange
automatically arranges the model series
according to the performance statistics selected with argument line
.
If obj
is a data.frame
returned by statsMS()
, then the
function uses standard arranging approaches. For most performance
statistics, the models are arranged in descending order. The exception is
when "r2"
, "adj_r2"
or "ADJ_r2"
are used, in which case
the models are arranged in ascending order. This means that the model with
lowest value appears in the leftmost side of the model series plot, while the
models with the highest value appears in the rightmost side of the plot.
> arrange(obj, adj_r2)
id a b c adj_r2
1 5 1 0 1 0.54
2 2 0 0 1 0.74
3 3 1 0 0 0.81
4 4 0 1 0 0.85
5 6 0 1 1 0.86
6 1 0 0 0 0.87
7 8 1 1 1 0.89
8 7 1 1 0 0.90
This results suggest that the best performing model is that of id = 7
,
while the model of id = 5
is the poorest one.
The model series plot allows to see how the design influences model performance. This is achieved mainly through the use of different colours in the grid output, where each unique value in the design data is represented by a different colour. For the example given above, one could try to see if the models built with the more accurate versions of the predictor variables have a better performance by identifying their relative distribution in the model series plot. The models placed at the rightmost side of the plot are those with the best performance.
The argument ind
provides another tool to help identifying how the
design, more specifically how each variable in the design data,
influences model performance. This is done by simply calculating the mean
ranking of the models that were built using the updated version of each
predictor variable. This very same mean ranking is also used to rank the
predictor variables and thus identify which of them is the most important.
After arranging the design
data described above using the adjusted
coefficient of determination, the following mean rank is obtained for each
predictor variable:
> rank_center
a b c
1 5.75 6.25 5.25
This result suggests that the best model performance is obtained when using
the updated version of the predictor variable b
. In the model series
plot, the predictor variable b
appears in the top row, while the
predictor variable c
appears in the bottom row.
An object of class "trellis"
consisting of a model series
plot.
Use the original functions xyplot
and
levelplot
for higher customization.
Some of the solutions used to build this function were found in the source code of the R-package mvtsplot. As such, the author of that package, Roger D. Peng <rpeng@jhsph.edu>, is entitled ‘contributors’ to the R-package pedometrics.
Alessandro Samuel-Rosa alessandrosamuelrosa@gmail.com
Deepayan Sarkar (2008). Lattice: Multivariate Data Visualization with R. Springer, New York. ISBN 978-0-387-75968-5.
Roger D. Peng (2008). A method for visualizing multivariate time series data. Journal of Statistical Software. v. 25 (Code Snippet), p. 1-17.
Roger D. Peng (2012). mvtsplot: Multivariate Time Series Plot. R package version 1.0-1. http://CRAN.R-project.org/package=mvtsplot.
# This example follows the discussion in section "Details" # Note that the data.frame is created manually id <- c(1:8) design <- data.frame(a = c(0, 0, 1, 0, 1, 0, 1, 1), b = c(0, 0, 0, 1, 0, 1, 1, 1), c = c(0, 1, 0, 0, 1, 1, 0, 1)) adj_r2 <- c(0.87, 0.74, 0.81, 0.85, 0.54, 0.86, 0.90, 0.89) obj <- cbind(id, design, adj_r2) p <- plotMS(obj, grid = c(2:4), line = "adj_r2", ind = 1, color = c("lightyellow", "palegreen"), main = "Model Series Plot") print(p)