Maximum likelihood estimation of transition probabilities

Hi, I'm new to Markov chains but have been advised by a colleague to use this approach for my latest study. I have a basic understanding of the theory but I am confused by the use of maximum likelihood for estimating the transition probabilities. Why is this done and also how is it done? In my first pass at this analysis I simply created the transition matrices from the observed data. Is this incorrect? I have only come across maximum likelihood in mixed effects modelling and although I have found a number of papers on the topic they don't explain why maximum likelihood estimation is used. They seem to go straight to presenting complex equations which I am unable to decipher. I was hoping someone might be able to clarify this for me!