An analysis of transient Markov decision processes.

H. W. James and E. J. Collins
A class of Markov decision process is considered in which the boundedness of expected future costs is ensured by a natural form of termination, at least under some policies. Previous treatments of such problems have generally restricted attention to the case where the set of states is finite. In this paper, it is shown that all the results of the finite-state case hold when the set of states is a general Borel space, provided one makes the additional assumption that the optimal value function is bounded below. A sufficient condition is also given for the optimal value function to be bounded below which holds in particular if the set of states is countable.

Some key words:
pursuit problem; first passage problem; stochastic shortest path problem; value iteration; policy iteration