Options for deriving probability distributions of time-sequenced transition matrices when the data lacks the "memoryless" feature of a Markov chain?

I have a dataset which I initially believed was suitable for running Markov Chain Monte Carlo simulations, in that there is a finite number of readily identifiable states that all population elements fall into and traverse over time, elements transition from one state to another (or remain in the same state) from one period to the next, all periods are of the same length, the behavior of the individual population elements are mostly (but not 100%) independent of one another, there is an "absorptive" state which some but not all elements eventually reach, etc.

However upon further analysis the data does not appear to have the "memoryless" property of a Markov chain. In my data, whose transition matrices are shown in the image at the bottom, an element ever touching state 04 or 05 has a significant impact on the state-behavior of that element in ALL future periods and not only the next period. The "memory" from one period to the next is quite sticky. Generally elements can dance around states NULL, 01, 02, fairly randomly from one period to the next but once an element touches any of states 03-05 then that element is forever "tainted" in its behavior and rarely falls out of those 03-05 states (mostly ending up in state 05).

I also easily calculate the SDEV of the transition probabilities. Note that in the below image I oriented the from/to transitions vertically along columns; the R code I used easily transposes the matrices.

My end goal is to run simulations to derive probability distributions for elements ending in the states of NULL, 01, 02, 03, 04, 05, in period x where x is a user input (where x > 6 in this dataset example).

So my question is, what is a good approach for running transition matrix simulations when your transitions do not meet the memoryless property of a Markov chain? In the context of an appropriate R package would be most helpful, for running simulations. I do everything in R now. Or in practice can the memoryless property be safely ignored in running MCMC? Is there an R package for testing the "Markovness" of a data set?