A Reinforcement Learning Algorithm Based on Policy Iteration for Average Reward Empirical Results with Yield Management and Convergence Analysis.

We present a Reinforcement Learning (RL) algorithm based on policy iteration for solving average reward Markov and semi-Markov decision problems. In the literature on discounted reward RL, algorithms based on policy iteration and actor-critic algorithms have appeared. Our algorithm is an asynchronou...

وصف كامل

التفاصيل البيبلوغرافية
الحاوية / القاعدة:	Machine learning. 55, 1 (2004).
المؤلف الرئيسي:	Gosavi, Abhijit
التنسيق:	مقال
اللغة:	English
الموضوعات:	Reinforcement learning. Average reward. Policy iteration.