Dynamic Programming suffers from the curse of dimensionality due to large state and action spaces, a challenge further compounded by uncertainties in the environment. To mitigate these issue, we explore an off-policy based Temporal Difference Approximate Dynamic Programming approach that preserves contraction mapping when projecting the problem into a subspace of selected features, accounting for the probability distribution of the perturbed transition probability matrix. We further demonstrate how this Approximate Dynamic Programming approach can be implemented as a particular variant of the Temporal Difference learning algorithm, adapted for handling perturbations. To validate our theoretical findings, we provide a numerical example using a Markov Decision Process corresponding to a resource allocation problem.
Off-Policy Temporal Difference Learning for Perturbed Markov Decision Processes / Forootani, Ali; Iervolino, Raffaele; Tipaldi, Massimo; Khosravi, Mohammad. - In: IEEE CONTROL SYSTEMS LETTERS. - ISSN 2475-1456. - 8:(2025), pp. 3488-3493. [10.1109/lcsys.2025.3547629]
Off-Policy Temporal Difference Learning for Perturbed Markov Decision Processes
Iervolino, RaffaeleMembro del Collaboration Group
;
2025
Abstract
Dynamic Programming suffers from the curse of dimensionality due to large state and action spaces, a challenge further compounded by uncertainties in the environment. To mitigate these issue, we explore an off-policy based Temporal Difference Approximate Dynamic Programming approach that preserves contraction mapping when projecting the problem into a subspace of selected features, accounting for the probability distribution of the perturbed transition probability matrix. We further demonstrate how this Approximate Dynamic Programming approach can be implemented as a particular variant of the Temporal Difference learning algorithm, adapted for handling perturbations. To validate our theoretical findings, we provide a numerical example using a Markov Decision Process corresponding to a resource allocation problem.| File | Dimensione | Formato | |
|---|---|---|---|
|
Off-Policy_Temporal_Difference_Learning_for_Perturbed_Markov_Decision_Processes.pdf
solo utenti autorizzati
Tipologia:
Versione Editoriale (PDF)
Licenza:
Copyright dell'editore
Dimensione
803.93 kB
Formato
Adobe PDF
|
803.93 kB | Adobe PDF | Visualizza/Apri Richiedi una copia |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


