Off-Policy Temporal Difference Learning for Perturbed Markov Decision Processes

IRIS

Dynamic Programming suffers from the curse of dimensionality due to large state and action spaces, a challenge further compounded by uncertainties in the environment. To mitigate these issue, we explore an off-policy based Temporal Difference Approximate Dynamic Programming approach that preserves contraction mapping when projecting the problem into a subspace of selected features, accounting for the probability distribution of the perturbed transition probability matrix. We further demonstrate how this Approximate Dynamic Programming approach can be implemented as a particular variant of the Temporal Difference learning algorithm, adapted for handling perturbations. To validate our theoretical findings, we provide a numerical example using a Markov Decision Process corresponding to a resource allocation problem.

Off-Policy Temporal Difference Learning for Perturbed Markov Decision Processes / Forootani, Ali; Iervolino, Raffaele; Tipaldi, Massimo; Khosravi, Mohammad. - In: IEEE CONTROL SYSTEMS LETTERS. - ISSN 2475-1456. - 8:(2025), pp. 3488-3493. [10.1109/lcsys.2025.3547629]

Off-Policy Temporal Difference Learning for Perturbed Markov Decision Processes

Iervolino, Raffaele^{Membro del Collaboration Group};Khosravi, Mohammad^{Visualization}

2025

Abstract

Dynamic Programming suffers from the curse of dimensionality due to large state and action spaces, a challenge further compounded by uncertainties in the environment. To mitigate these issue, we explore an off-policy based Temporal Difference Approximate Dynamic Programming approach that preserves contraction mapping when projecting the problem into a subspace of selected features, accounting for the probability distribution of the perturbed transition probability matrix. We further demonstrate how this Approximate Dynamic Programming approach can be implemented as a particular variant of the Temporal Difference learning algorithm, adapted for handling perturbations. To validate our theoretical findings, we provide a numerical example using a Markov Decision Process corresponding to a resource allocation problem.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2025
			
	Rivista
	
				IEEE CONTROL SYSTEMS LETTERS
			
	Citazione
	
				Off-Policy Temporal Difference Learning for Perturbed Markov Decision Processes / Forootani, Ali; Iervolino, Raffaele; Tipaldi, Massimo; Khosravi, Mohammad. - In: IEEE CONTROL SYSTEMS LETTERS. - ISSN 2475-1456. - 8:(2025), pp. 3488-3493. [10.1109/lcsys.2025.3547629]
			
	Appare nelle tipologie:
	
				1.1 Articolo in rivista

File in questo prodotto:

File	Dimensione	Formato
Off-Policy_Temporal_Difference_Learning_for_Perturbed_Markov_Decision_Processes.pdf solo utenti autorizzati Tipologia: Versione Editoriale (PDF) Licenza: Copyright dell'editore Dimensione 803.93 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	803.93 kB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11588/999528

Citazioni

ND

1

ND

social impact