brnn_extended {brnn} | R Documentation |
The brnn_extended function fits a two layer neural network as described in MacKay (1992) and Foresee and Hagan (1997). It uses the Nguyen and Widrow algorithm (1990) to assign initial weights and the Gauss-Newton algorithm to perform the optimization. The hidden layer contains two groups of neurons that allow us to assign different prior distributions for two groups of input variables.
brnn_extended(x, ...) ## S3 method for class 'formula' brnn_extended(formula, data, contrastsx=NULL,contrastsz=NULL,...) ## Default S3 method: brnn_extended(x,y,z,neurons1,neurons2,normalize=TRUE,epochs=1000, mu=0.005,mu_dec=0.1, mu_inc=10,mu_max=1e10,min_grad=1e-10, change = 0.001, cores=1,verbose =FALSE,...)
formula |
A formula of the form |
data |
Data frame from which variables specified in |
y |
(numeric, n) the response data-vector (NAs not allowed). |
x |
(numeric, n x p) incidence matrix for variables in group 1. |
z |
(numeric, n x q) incidence matrix for variables in group 2. |
neurons1 |
positive integer that indicates the number of neurons for variables in group 1. |
neurons2 |
positive integer that indicates the number of neurons for variables in group 2. |
normalize |
logical, if TRUE will normalize inputs and output, the default value is TRUE. |
epochs |
positive integer, maximum number of epochs to train, default 1000. |
mu |
positive number that controls the behaviour of the Gauss-Newton optimization algorithm, default value 0.005. |
mu_dec |
positive number, is the mu decrease ratio, default value 0.1. |
mu_inc |
positive number, is the mu increase ratio, default value 10. |
mu_max |
maximum mu before training is stopped, strict positive number, default value 1e10. |
min_grad |
minimum gradient. |
change |
The program will stop if the maximum (in absolute value) of the differences of the F function in 3 consecutive iterations is less than this quantity. |
cores |
Number of cpu cores to use for calculations (only available in UNIX-like operating systems). The function detectCores in the R package parallel can be used to attempt to detect the number of CPUs in the machine that R is running, but not necessarily all the cores are available for the current user, because for example in multi-user systems it will depend on system policies. Further details can be found in the documentation for the parallel package |
verbose |
logical, if TRUE will print iteration history. |
contrastsx |
an optional list of contrasts to be used for some or all of the factors appearing as variables in the first group of input variables in the model formula. |
contrastsz |
an optional list of contrasts to be used for some or all of the factors appearing as variables in the second group of input variables in the model formula. |
... |
arguments passed to or from other methods. |
The software fits a two layer network as described in MacKay (1992) and Foresee and Hagan (1997). The model is given by:
y_i= ∑_{k=1}^{s_1} w_k^{1} g_k (b_k^{1} + ∑_{j=1}^p x_{ij} β_j^{1[k]}) + ∑_{k=1}^{s_2} w_k^{2} g_k (b_k^{2} + ∑_{j=1}^q z_{ij} β_j^{2[k]})\,\,e_i, i=1,...,n
e_i ~ N(0,σ_e^2).
g_k(.) is the activation function, in this implementation g_k(x)=(exp(2x)-1)/(exp(2x)+1).
The software will minimize
F=β E_D + α θ_1' θ_1 +δ θ_2' θ_2
where
y_i-\hat y_i, i.e. the sum of squared errors.
beta=1/(2*sigma^2_e).
alpha=1/(2*sigma_theta_1^2), sigma_theta_1^2 is a dispersion parameter for weights and biases for the associated to the first group of neurons.
delta=1/(2*sigma_theta_2^2), sigma_theta_2^2 is a dispersion parameter for weights and biases for the associated to the second group of neurons.
object of class "brnn_extended"
or "brnn_extended.formula"
. Mostly internal structure, but it is a list containing:
$theta1 |
A list containing weights and biases. The first s_1 components of the list contain vectors with the estimated parameters for the k-th neuron, i.e. (w_k^1, b_k^1, β_1^{1[k]},...,β_p^{1[k]})'. s_1 corresponds to neurons1 in the argument list. |
$theta2 |
A list containing weights and biases. The first s_2 components of the list contains vectors with the estimated parameters for the k-th neuron, i.e. (w_k^2, b_k^2, β_1^{2[k]},...,β_q^{2[k]})'. s_2 corresponds to neurons2 in the argument list. |
$message |
String that indicates the stopping criteria for the training process. |
Foresee, F. D., and M. T. Hagan. 1997. "Gauss-Newton approximation to Bayesian regularization", Proceedings of the 1997 International Joint Conference on Neural Networks.
MacKay, D. J. C. 1992. "Bayesian interpolation", Neural Computation, vol. 4, no. 3, pp. 415-447.
Nguyen, D. and Widrow, B. 1990. "Improving the learning speed of 2-layer neural networks by choosing initial values of the adaptive weights", Proceedings of the IJCNN, vol. 3, pp. 21-26.
## Not run: #Example 5 #Warning, it will take a while #Load the Jersey dataset data(Jersey) #Predictive power of the model using the SECOND set for 10 fold CROSS-VALIDATION data=pheno data$G=G data$D=D data$partitions=partitions #Fit the model for the TESTING DATA for Additive + Dominant out=brnn_extended(yield_devMilk ~ G | D, data=subset(data,partitions!=2), neurons1=2,neurons2=2,epochs=100,verbose=TRUE) #Plot the results #Predicted vs observed values for the training set par(mfrow=c(2,1)) yhat_R_training=predict(out) plot(out$y,yhat_R_training,xlab=expression(hat(y)),ylab="y") cor(out$y,yhat_R_training) #Predicted vs observed values for the testing set newdata=subset(data,partitions==2,select=c(D,G)) ytesting=pheno$yield_devMilk[partitions==2] yhat_R_testing=predict(out,newdata=newdata) plot(ytesting,yhat_R_testing,xlab=expression(hat(y)),ylab="y") cor(ytesting,yhat_R_testing) ## End(Not run)