Thanks, That exercise was very helpful. I went in thinking that the probability of a string occurring twice in a series was a function of the length of the string and the length of the series, but I learned that it is also a function of the number of symbols that can occur at each point (in...
That simulation was very helpful, thank you. I'm not interested in testing for randomness per se, although the question of randomness is implied in the problem. What I am investigating is at what length two identical substrings in a series become so large that the probability they occurred...
So if one picks a 4 symbol substring from the rqandomly generated 1k symbol series, the probability that it will occur a second time in the series is . . . . ~10%?
Yeah, I don't want to blow up my computer , , , , I am not really interested in calculating the probability of a fifty-letter sequence occurring twice in a 1K series (not overlapping, etc.). I am interested in understanding how it is done. And in this case why simulation might be a better way...
Thank you for the help Katxt, The possibility of overlapping sequences seems to me to be an excellent point that I hadn't thought of. Also, I guess there isn't any pressing need to to have a closed-form solution since one can calculate the probability for an individual instance. I feel like I...
Thanks for the reply, OK, now let’s assume there is a 1000 element series, as above, comprised of the first ten letters of the alphabet: (a, b, c, d, e, f, g, h, i, j). We don't know whether it is randomly generated or not, but we work from the assumption that it is. We note that a three-letter...
So in a 1000 letter series generated by the above random letter generator, one should expect to find approximately 100 "d"s, and approximately 10 "d, g" pairs, and approximately 1 "d, g, c" sequence? Am I on the right track here?
That's a good question, but I'm not sure it is what I was trying to get at. Perhaps the problem could be simplified and restated thus: suppose we have a random letter generator that outputs lower case letters of the first 10 letters of the alphabet (a,b,c, . . .,h,i,j), and we use this random...
Thanks for the reply. It may help me to better understand my question. Assume there is a time series: {1,5,3,5,1,5,3,5,}. Can one determine the probability that the sequence1,5,3,9,5 occurred two times in this series by chance? I would expect that it would be unlikely. Now assume that the...
Hi everybody. I'm trying to understand the methodology for determining the likelihood a sequence that repeats twice in a time series is not by chance. Assuming the two sequences are identical, I'm guessing the likelihood is some function of the length of the sequence and the size of the series.