Odds of a certain 5 character string occurring in a random string of 512.

#1
I work in security and recently had a false positive happen where a certain word that was being tested occurred in the session id. Matching is not case sensitive.

The character ranges that can be used in the session id:
0-9, A-Z, a-z, +, /. (no comma or period)

The security check was looking for xPATH in the session id.

SESSION=XZAZlF8ob+19uDtNuFkiZEHWbjetuClHZqFkS5Hi4tmTlPVvRcmGqVNlI+rRl6W/kP3qy/kfODnTDhwYqwfBvyuEkl1Q5UU9bz7hHlU8ZEZVKsToB2LKi7K+CnRJnbFwBOQw5n14Frq21K60zM5tZe3jLW4b1MdBlgM0mIEqkXbpR6LTOtQa5+VD7834m0KFi7wrd16lR/Ph3zEfUVac5GJwppAKheREjeehq2q57ab57cUTrCWgV2piMdgJ+zAaQhABnVa5ZLy5snTtoGIK7Mpgez43E0/KfaHQNL/HLz56rkLTYZCt0WdJNeU+rzcDC1zPETSiIPc3lgv9NTPa3uw3vw+Y5UHqScm2Mfan+chOE5sYwHiZL7gxgCmjiSbD8+xDAtpEeQxPAtHYzyle7lSK0jsjsJlk1yK6+NGF6+k9U0XrlYQt9X0DikRSn4yHuvvIC5iRmLkukdQh4eTbjHSB8ydVECftsOKft14Cvx39zP226MrD5bGvKzICGEn;

Could someone help me work out the odds of this happening? (Not counting that it just happened to occur while this particular audit was done out of the many thousands of audits)
 

hlsmith

Less is more. Stay pure. Stay poor.
#2
Clarification, you state matching is not case sensitive - so you don't care if "xPAt" is captilized in a different way as long as the order of these characters is the same??? This will effect the number of possibilities per the 512 selections.

Does each character have the same chance of being selected each time. So is there replacement back into the pool of possibilities?

And can you provide clarification on exactly how many options there are to select from for each character (64??)?

And the xPAt seems to occur before 512 characters, so there is the probability within 512 and probability of it occurring within 512 - X.
 
#3
Second time my session timed out after crafting a message! :/ Yes, 64 characters possible. For this exercise just assume that each character is random and can occur in any position in the 512 bytes. There are 25 combinations of xPATH that would match: xpAth, XPatH etc.

Thanks,
John
 

Dason

Ambassador to the humans
#4
Why only 25? If any of the letters in xpath can be capitalized or not then shouldn't it be 32 combinations?
 

Dason

Ambassador to the humans
#7
How exact do you want the probability to be? Because we could just simulate a lot of data and check how often it occurs...
 
#8
Sure, we could brute force it but I was hoping to learn how to calculate it. I took one stats class in college but I overwrote that data long ago. Could someone walk me through how to calculate this? It doesn't have to be exact; it is just a curiosity.
 

Dason

Ambassador to the humans
#9
This problem is actually quite a bit tougher than you might imagine. And I wasn't suggesting brute forcing the solution since that would be entirely unreasonable - it would take far too long to generate every possible sequence and then check how many contain an acceptable variation of xpath.
 
#12
That's what I came up with also, running this program I whipped up to test 1 billion random sessions. Still interested in learning about how to calculate it. :)

Code:
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <string.h>

char *rsession(char *dst, int size)
{
   static const char text[] = "abcdefghijklmnopqrstuvwxyz0123456789+/"
                              "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
   int i;
   for ( i = 0; i < size; ++i )
   {
      dst[i] = tolower(text[rand() % (sizeof text - 1)]);
   }
   dst[i] = '\0';
   return dst;
}

int main (void)
{
   char session[513];
   char xpath[] = "xpath";
   srand(time(0));
   int found_count = 0;
   int i = 0;
   for(i = 0; i < 1000000000; i++) {
      rsession(session, sizeof session);
      if(strstr(session,xpath) != NULL)
         found_count++;
      if(i % 1000000 == 0)
         printf("%d/%d\n",found_count,i);
   }

   printf("\nHappened %d/1,000,000\n",found_count);

   return 0;
}
Currently at 475/30,000,000

~ > ./xpath
0/0
11/1000000
27/2000000
48/3000000
67/4000000
80/5000000
93/6000000
109/7000000
130/8000000
149/9000000
166/10000000
174/11000000
190/12000000
206/13000000
221/14000000
237/15000000
251/16000000
265/17000000
277/18000000
287/19000000
309/20000000
318/21000000
335/22000000
351/23000000
367/24000000
389/25000000
400/26000000
424/27000000
443/28000000
464/29000000
475/30000000
 
#13
So I have some observations here. Let's say we calculate how often a case insensitive xpath will occur in a 5 character string. If we double the key length we are effectively making two separate events into one. Every time we would double the length of the string it would increase the odds of it happening. The odds I'm seeing after running about a billion tests is about 1/63,000 for my original question. Unsure of the degree of error.

So lets say we take this as a ballpark range:
length odds
8 1/4,032,000
16 1/2,016,000
32 1/1,008,000
64 1/504,000
128 1/252,000
256 1/126,000
512 1/63000

If we calculate the odds for xpath occurring in a 8 character session we can then calculate odds in greater strings...

Then I ran 4 1,000,000,000 runs for a 5 character "session". and the odds came out to about 1/17,000,000. This table comes out a little high for the estimates...

session length odds
10 1/10,000,000
20 1/5,000,000
40 1/2,500,000
80 1/1,125,000
160 1/562,500
320 1/281,250
640 1/140,625

Can someone help me calculate the same problem with an 8 character session?

Thanks!
John