Dynamic Programming suffers from the curse of dimensionality due to large state and action spaces, a challenge further compounded by uncertainties in the environment. To mitigate these issue, we explore an off-policy based Temporal Difference Approximate Dynamic Programming approach that preserves contraction mapping when projecting the problem into a subspace of selected features, accounting for the probability distribution of the perturbed transition probability matrix. We further demonstrate how this Approximate Dynamic Programming approach can be implemented as a particular variant of the Temporal Difference learning algorithm, adapted for handling perturbations. To validate our theoretical findings, we provide a numerical example using a Markov Decision Process corresponding to a resource allocation problem.

Off-Policy Temporal Difference Learning for Perturbed Markov Decision Processes / Forootani, Ali; Iervolino, Raffaele; Tipaldi, Massimo; Khosravi, Mohammad. - In: IEEE CONTROL SYSTEMS LETTERS. - ISSN 2475-1456. - 8:(2025), pp. 3488-3493. [10.1109/lcsys.2025.3547629]

Off-Policy Temporal Difference Learning for Perturbed Markov Decision Processes

Iervolino, Raffaele
Membro del Collaboration Group
;
2025

Abstract

Dynamic Programming suffers from the curse of dimensionality due to large state and action spaces, a challenge further compounded by uncertainties in the environment. To mitigate these issue, we explore an off-policy based Temporal Difference Approximate Dynamic Programming approach that preserves contraction mapping when projecting the problem into a subspace of selected features, accounting for the probability distribution of the perturbed transition probability matrix. We further demonstrate how this Approximate Dynamic Programming approach can be implemented as a particular variant of the Temporal Difference learning algorithm, adapted for handling perturbations. To validate our theoretical findings, we provide a numerical example using a Markov Decision Process corresponding to a resource allocation problem.
2025
Off-Policy Temporal Difference Learning for Perturbed Markov Decision Processes / Forootani, Ali; Iervolino, Raffaele; Tipaldi, Massimo; Khosravi, Mohammad. - In: IEEE CONTROL SYSTEMS LETTERS. - ISSN 2475-1456. - 8:(2025), pp. 3488-3493. [10.1109/lcsys.2025.3547629]
File in questo prodotto:
File Dimensione Formato  
Off-Policy_Temporal_Difference_Learning_for_Perturbed_Markov_Decision_Processes.pdf

solo utenti autorizzati

Tipologia: Versione Editoriale (PDF)
Licenza: Copyright dell'editore
Dimensione 803.93 kB
Formato Adobe PDF
803.93 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11588/999528
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 1
  • ???jsp.display-item.citation.isi??? ND
social impact