ended1000 Ideas

Reinforcement Learning: Beyond Optimality

Verstärkungslernen ohne Optimalität

View on FWF Research Radar

Principal Investigator

Name: Ronald Ortner
Role: Projektleiter:in
ORCID: 0000-0001-6033-2208
Institution: Montanuniversität Leoben

Grant Details

Approval Date: 21 Jun 2021
Start Date: 10 Jan 2022
End Date: 9 Aug 2024
Approved Amount: € 150.761

Keywords & Classification

Keywords

Reinforcement Learning (Theory)

Research Disciplines

Machine learningComputational intelligence

Research Fields

Computer Sciences

Project Summary

The research area of reinforcement learning develops algorithms that are able to learn complex behavior (like driving or playing a computer or board game). Some of the considered learning problems aim to learn some optimal behavior, where the goal is to be able to do something as good as possible. For example, when learning to play a computer game the goal might be to score the maximum number of points. Most reinforcement learning algorithms are indeed based on optimization, that is, they aim to maximize rewards (such as the scoring points in a computer game). However, there are many learning problems that actually do not contain an optimization component. Thus, an autonomous car that shall get us to work needs neither be as fast as possible nor take the shortest route. It would usually be sufficient if it manages to be right on time. For most of the currently available learning algorithms it would still be necessary to formulate the problem setting as an optimization problem to be able to apply them. This not only means additional work. The arising optimization problems are usually also hard to solve. For example, computing the shortest or fastest route to work (up to inches or seconds) is practically infeasible. Accordingly, most learning algorithms are hardly applicable to typical real world problems. The project at hand aims to find algorithms that are not able to solve problems optimally but just good enough, but do that much faster. In a first step it will be necessary to work on suitable mathematica l models, for which in a second step we shall develop learning algorithms that are more widely applicable to real world problems.

Research Outputs (3)

publications (3)

Title	Year(s)	DOI / Link
Online Regret Bounds for Satisficing in Markov Decision ProcessesMathematics of Operations Research	2025	10.1287/moor.2023.0275
Understanding the Gaps in Satisficing Bandits	2026

Further Funding (0)

No additional funding sources recorded.