A new learning rate based on Andrei method for training feed-forward artificial neural networks

In this paper we developed a new method for computing learning rate for Back-propagation algorithm to train a feed-forward neural networks. Our idea is based on the approximating the inverse Hessian matrix for the error function originally suggested by Andrie. Experimental results show that the proposed method considerably improve the convergence rate of the Back-propagation algorithm for the chosen test problem


Introduction
Neural networks are composed of simple elements operating in parallel.These elements are inspired by biological neurons systems.As in nature, the network function is determined largely by the connections between elements.We can train a neural network to perform a particular function by adjusting the values of the connections(weights), between elements, commonly neural networks are adjusted, or trained so that a particular input leads to as specific target output.The network is adjusted, based on a comparison of the output and the target, until the network output matches the target.Typically many such input/target pairs are used in this supervised learning to train a network.Batch training of the network proceeds by making weight and bias changes based on an entire set (batch) of input vectors [6].The batch training of the Multi-layer Feed-forward Neural network (MFFN) can be formulated as a non-linear unconstrained minimization problem [8,9] Namely.m in ( ), where E is the batch error measure defined as the sum of squared differences Error functions over the entire training set , defined by is the squared differences between the actual j-th output layer neuron for pattern P and the target output value.The scalar P is an index over input-output pairs, the general purpose of the training is to search an optimal set of connection weights in the manner that the error of the network output can be minimized.The most popular training algorithm is the Classical Batch Back-Propagation (CBP) introduced by Rumelhart, Hinton and Williams [12].Although the CBP algorithm is a simple learning algorithm for training Multi-layer Feed-Forward MFF networks, unfortunately it is not based on a sound theoretical basis and is very inefficient and unreliable.One iteration of the CBP algorithm can be written Where k w is the vector of current weights and biases, ( ) with CBP the learning rate is held constant throughout training.The performance of the algorithm is very sensitive to the proper setting of the learning rate [5].In order to overcome to the drawbacks of the CBP algorithm many gradient based training algorithms have been proposed in the literature [1,2,5,7,13].

Some Modifications on CBP.
A surprising result was given by Brazilian and Brownie [3], which gives formula for the learning rate k  and leads to super linear convergence.The main idea of Brazilia and Brownie (BB) method is to use the information in the previous iteration to decide the step size (learning rate) in the current iteration.The iteration in equation ( 3) is viewed as (4) Where respectively.Note that we abbreviate the method defined in equation( 3) with learning rate defined in equations ( 7) and ( 8) as BB1 and BB2 methods, respectively.An alternative approach is based on the work of Plagianakos et al [11].Following this approach, equation ( 3) is reformulated to the following Scheme:

k E w 
A well known difficulty to this approach is that the computation of the Eigen values or estimating them is not a simple task, hence the schema defined in equation ( 9) is not practical .

3-Development Method
In the following we suggest another procedure for computing a scalar approximation of the Hessian of the function So, the first step is computed using the backtracking along the negative gradient.Now, at point 1 , 0 ,1, ...

E at
, from Taylor series we have Where z is on the line segment connecting ) , where . This is an anticipative view point, in which a scalar approximation to the Hessian at point w  .There for, we can write see [4]: Observe that at and approximation of 2 1 ( ) Now, in order to compute the next estimation we must consider a procedure to step length 1 k   .For this let us consider the function: Observe that function for all 0   .To have a minimum for 1 () as the minimum point of 1 () Showing that, if 1 0 k    , then at every iteration the value of function 1 k E  is reduced [4].On the other hand, if happen that 1 0 k    , we may define the following correction on the value of To avoid the above problem (equations 15 and 16), we suggest the following formula to compute the learning rate at each epoch .Compute Step4.Set k=k+1 and go to Step 2.

Experiments and Results:
A computer simulation has been developed to study the performance of the following algorithms.1-GD: classical back-propagation algorithm.2-GDA: Adaptive back-propagation algorithm taken from Matlab-Toolbox.3-F1SBP: New suggested training algorithm.The simulations have been carried out using MATLAB (7.6) the performance of the MSBP has been evaluated and compared with batch versions of the above algorithm.The algorithms were tested using the initial weights, initialized by the Nguyenwidrow method [10] and received the same sequence of input patterns .The weights of network are updated only after the entire set of patterns to be learned has been presented .For each of the test problems, a table summarizing the performance of the algorithms for simulations that reached solution is presented .The reported parameters are min the minimum number of epochs for 50 simulation , mean the mean value of epochs for 50 simulation, Max the maximum number of epochs for 50 simulation, Tav the average of total time for 50 simulation and Succ, the succeeded simulations out of (50) trails within error function evaluations limit.If an algorithm fails to converge within the above limit considered that it fails to train the FFNN, but its epochs are not included in the statical analysis of the algorithm, one gradient and one error function evaluations are necessary at each epoch.

1-Problem (XOR Problem)
The first problem we have been encountered with is the XOR Boolean function problem, which is considered as a classical problem for the FFNN training .The XOR function maps two binary inputs to a single binary output.As it is well known this function is not linearly separable.The network architectures for this binary classification problem consists of one hidden layer with 3 neurons and an output layer of one neuron.The termination criterion is set to 2 0 .00 2   within the limit of 1000 epochs, and table (1) summarizes the result of all algorithms i.e for 50 simulations the minimum epochs for each algorithm are listed in the first column (Min), the maximum epochs for each algorithm are listed in the second column, third column contains (Mean) the mean of epochs and (Tav) is the average of time for 50 simulations and last columns contain the percentage of succeeds of the algorithms in 50 simulations.

Conclusions
In this paper we proposed a new formula for computing learning rate in the back-propagation algorithm for training feed-forward multi-layer neural networks.Based on our numerical experiments, we concluded that our proposed method outperforms classical Back-propagation and adaptive Backpropagation training algorithms and has a potential to significantly enhance the computational efficiency and robustness of training process.
used to get the step-size along the negative gradient.Let us consider the initial point 0  .Having in view the local character of the searching procedure and that the distance between

Table ( 2): Results of simulations for the Function
This is also a binary classification task, where patients heart images are classified as normal is abnormal.The class distribution has 55 instances of the abnormal class 20.6% and 212 instances of the normal class (79.4%)'From them there have been selected 80 instances for the training process and the remainder 187 for testing the neural networks generalization capability.The network architecture for this medical classification problem constitute of 1 hidden layer with 6 neurons and an output layer of 2 neurons.The termination criterion is set to