We recast a class of denumerable-state, infinite-action Markov renewal programs with unknown parameters as one-state programs with actions corresponding to stationary policies in the original program. Under suitable conditions we find an adaptive (nonstationary) optimal policy in the sense of maximizing long-run expected reward per unit time.
"Adaptive Policies for Markov Renewal Programs." Ann. Statist. 1 (2) 334 - 341, March, 1973. https://doi.org/10.1214/aos/1176342370