A Reinforcement Learning Algorithm Based on Policy Iteration for Average Reward Empirical Results with Yield Management and Convergence Analysis.
We present a Reinforcement Learning (RL) algorithm based on policy iteration for solving average reward Markov and semi-Markov decision problems. In the literature on discounted reward RL, algorithms based on policy iteration and actor-critic algorithms have appeared. Our algorithm is an asynchronou...
| 出版年: | Machine learning. 55, 1 (2004). |
|---|---|
| 第一著者: | |
| フォーマット: | 論文 |
| 言語: | English |
| 主題: |