Von Satisfizierung zur Optimierung im Verstärkungslernen
View on FWF Research RadarKeywords
Research Disciplines
The research area of reinforcement learning develops algorithms that are able to learn complex behavior (like driving or playing a computer or board game). Some of the considered learning problems aim to learn some optimal behavior, where the goal is to be able to do something as good as possible. For example, when learning to play a computer game the goal might be to score the maximum number of points. Most reinforcement learning algorithms are indeed based on optimization, that is, they aim to maximize rewards (such as the scoring points in a computer game). However, there are many learning problems that actually do not contain an optimization component. Thus, an autonomous car that shall get us to work need neither be as fast as possible nor take the shortest route. It would usually be sufficient if it manages to be right on time. For most of the currently available learning algorithms it would still be necessary to formulate the problem setting as an optimization problem to be able to apply them. This not only means additional work. The arising optimization problems are usually also hard to solve. For example, computing the shortest or fastest route to work (up to inches or seconds) is practically infeasible. Accordingly, most learning algorithms are hardly applicable to typical real world problems. A precursor project investigated the question whether there is an advantage in solving problems not optimally bot only sufficiently. While it was known that an optimal strategy can only be solved in approximation, it could be shown that a sufficient strategy with respect to given satisficing level can also be learned exactly. Remarkably, this also means that an optimal strategy can be learned exactly if the learner knows a sufficiency level that is only satisfied by the optimal strategy. Accordingly, in the current project we aim to look at reinforcement learning algorithms that try to adaptively determine such an appropriate satisficing level. These algorithm may be able to learn in real world problems more efficiently and hence much faster.
This project has no linked research outputs in the database.
No additional funding sources recorded.
Research Fields