M.S. Thesis Presentation by Stephen Rose
Wednesday, April 10, 2002

(Dr. Nader Sadegh, advisor)

"Adaptive Sequential Optimization of Unknown Functions through Reinforcement Back-Propagation"


A method is presented to combine reinforcement learning with target-based training of neural networks to control an unknown plant. The objective is to combine the generality of reinforcement learning with the speed and flexibility of target-trainable controllers, specifically neural networks. Reinforcement learning methods adapt a controller based solely on a scalar performance measure, which requires only knowledge of the performance objective and not knowledge of the working of the plant. However, the scalar performance evaluation in reinforcement learning methods cannot be used as a target for supervised training of a neural network. If a training target is known, a controller can be trained efficiently by algorithms which make use of the known gradient of network output with respect to the parameters of the network, such as steepest-descent back-propagation or conjugate gradient methods. The scheme presented here uses an adaptive search element (ASE) to explore the space of control actions and drive the adaptation of a neural network controller by perturbing the controller output before it is sent to the plant.