Archive for the ‘Reinforcement-Learning’ Category

2DJpgtrails20percent Fixations after Learning
0-20% of the trails 60-80% of the trails
Duration of fixations (average over all subjects)

The research group recently proposed a computational model of perception in the brain as active pattern generation. It suggests to combine attention and object/category recognition in a single interconnected network. Perception will be formalized by an active, top-down directed inference process. In it a target template will be learned and maintained. Little is known about the nature of these templates. The proposal is that the brain can create these from learning in a reward-based scenario. The responsible brain region for reward and Reinforcement Learning is the Basal Ganglia. For category discrimination the brain might learn more abstract or generalized features of single objects. The hypothesis is, that such high order visual templates also guide visual perception, i.e. gaze, during learning which can be measured by an eye tracking system.

To test this hypothesis an experimental study was run and 12 human subjects were trained on a subordinate category recognition task (fish) with category stimuli similar to those in earlier studies (Sigala & Logothetis, Nature, 2002; Sigala, Gabbiani & Logothetis, J Cog Neursci., 2002; Peters, Gabbiani & Koch, Vision Res., 2003). The decision space was designed to allow a full separation of two categories by only two of four features. This disjunction investigated if the subjects are capable to detect and focus on the features with the relevant information. In this study, a single stimulus was presented to the subjects. They had to press one of two buttons to indicate their decision for a category. The stimulus disappeared after the button had been pressed. The subjects received feedback only in case of wrong answers. During the presentation, the subject´s eye movements were recorded by an eye tracker (Eyelink from SR Research). On average the subjects learned the task after 100 trials (85% correct).

The data confirms our hypothesis. On average there is a general shift of fixations towards locations with relevant features. Thus, subjects are able to learn which features are informative and tend to fixate onto these to compute their final decision about the category choice. The next step of the project is to build a model considering  the structure of the Basal Ganglia.

Vitay, J., Fix, J., Beuth, F., Schroll, H. and Hamker, F.H. (2009)
“Biological models of reinforcement learning” (submitted)

Abstract: This review focuses on biological issues of reinforcement learning. Since the influential discovery of W. Schultz of an analogy between the reward prediction error signal of the temporal difference algorithm and the firing pattern of some dopaminergic neurons in the midbrain during classical conditioning, biological models have emerged that use computational reinforcement learning concepts to explain adaptative behavior. In particular, the basal ganglia has been proposed to implement among other things reinforcement learning for action selection, motor control or working memory. We discuss to which extent the analogy between the temporal difference algorithm and the firing of dopamine cells can be considered as valid. Our review then focuses on the basal ganglia, their anatomy and key computational properties as demonstrated by three recent, influential models.