# Hello all - need some help!

#### Moose

##### New Member
Ideally, the random input generated will match reality. If the average number of accidents per year is very unlikely to be less than 15 in real life, then the number generated in the simulation should be very unlikely to be less than 15. Sometimes you know that the distribution may be normal with known mean and SD. If you don't know the exact distribution, then a common compromise is the triangular. You set the maximum, minimum and most likely values. Then the simulation picks a random input from that range, most likely around the nominated values. So you might, after due consideration and consultation with experts decide that the average MVA/year is probably about 25, but because of our lack of hard data could possibly be as low as 20, say. or as high as 30. We could then use a triangular 20, 25, 30 and plausible scenarios would have MVA/y mainly round 25 but going as low as 20 and as high as 30. You are in effect modelling your ignorance of the real value.
Too smart!

You are great thanks for all your help!

I will go away and play with that tool with some phony data to see if I can get the hang of things!

M

#### hlsmith

##### Less is more. Stay pure. Stay poor.
Next you layer on the other knowledge. Such as you know 60% of the accidents occur on highways, and safety precautions have been implement to potentially reduce these accidents by 10%, while 40% of accident occur in residential areas. You can model these scenarios and merge the final numbers. Then keep adding more details.

#### Moose

##### New Member
Next you layer on the other knowledge. Such as you know 60% of the accidents occur on highways, and safety precautions have been implement to potentially reduce these accidents by 10%, while 40% of accident occur in residential areas. You can model these scenarios and merge the final numbers. Then keep adding more details.
Awesome,

So is there a best practice way of doing that? like do you keep all "known" variables out of the simulation process and apply them after or do you run simulation with all of the variables?

#### katxt

##### Well-Known Member
Generally, you pretend that you know all the inputs exactly, and set up your possibly quite complex calculation. I'm going to assume in Excel say. This gives you a single number as a possible answer to your question (a point estimate in stats speak). However, you aren't sure of the true value of many of the inputs so this uncertainty leads to an uncertainty in the answer. We want to extend our point estimate into a confidence interval (a range inside which the true answer very likely lies). This is what Monte Carlo risk analysis does for you.
So, set up your calculation with assumed values (or true if you know them). Then replace each uncertain input with a formula which gives a plausible value for that input. (I find it helpful to work on a copy of the calculations.) When all the uncertain inputs have been replaced with suitable distributions use what software you have to generate a few thousand plausible scenarios.

#### hlsmith

##### Less is more. Stay pure. Stay poor.
The only thing you would know for certain would be deterministic relationships or formulas. Everything else has uncertainties. Say five people died last year, I don't know for certain that 5 will be known to die in a forecasted year. However I do know physical laws or constants for certain, but input values may be epistemic.

#### Moose

##### New Member
Is there anything I would need to consider when I am using the "triangle" to model my ignorance if the deviations are not equal? e.g. 10 / 20 / 60

#### katxt

##### Well-Known Member
Is there anything I would need to consider when I am using the "triangle" to model my ignorance if the deviations are not equal? e.g. 10 / 20 / 60
You need reasons for the 10, 20 and 60.
There are many distributions. Almost all of them have a most likely value somewhere and taper off in both directions. The normal distribution is a common example and you can use if you have an idea of the centre and the variability round it and you think it is symmetric. It is unsuitable if you know that the uncertainty is not symmetric. If you are well up in maths and stats there are other distributions which are not symmetric, but often the exact distribution you use is not critical, so long as it is reasonable. The triangular lets you fix the lowest, highest and most likely values as a good approximation. (Imagine a normal curve squashed and reshaped into a triangle.) The costs for a building project may be estimated at 50k. This is the most likely. If you are lucky, you might get a special deal somewhere but you judge that it certainly won't cost less than 45k. On the other hand price overruns are more possible but after talking to specialists, they assure you that it is most unlikely to be more than 60k. So to get a plausible range of costs ("plausible" is the key word in all this) you could model it as triangular 45k 50k 60k. This will generate plausible costs round the 50k mark, occasionally as low as 50k or as high as 60k. The real distribution won't be exactly triangular of course but it will be a reasonable approximation.

#### Moose

##### New Member
Ok great!

So standard deviation. Can you explain it like you are talking to a 6 yr old? Because yea.... how does that work... asking for a friend

#### hlsmith

##### Less is more. Stay pure. Stay poor.
So basics about STD or STD in MC? STD is used to understand the dispersion of data. So if you had a nice normally distributed variable, one would expect ~68% of data to land within + or - 1 STD from the mean, 2 STD have ~ 95 coverage, 3 STDS ~ 99 coverage, etc.

Knowing this can help influence your input values for a MC simulation.

#### Moose

##### New Member
So basics about STD or STD in MC? STD is used to understand the dispersion of data. So if you had a nice normally distributed variable, one would expect ~68% of data to land within + or - 1 STD from the mean, 2 STD have ~ 95 coverage, 3 STDS ~ 99 coverage, etc.

Knowing this can help influence your input values for a MC simulation.
Ok so if my mean is 100 my SD would be 50 to 150? And is the approx 68% a fixed thing?

Yea does it make a change in MC?

#### Moose

##### New Member
I made it up, but now that I see that image i assume it's only correct in a specific instance...

What I should have said was...
If my mean was 100 and my SD is 50 then it would be 50 to 150?

#### hlsmith

##### Less is more. Stay pure. Stay poor.
You would expect 68% to land between 50-150 and 99% 0-200, and is it realistic for a value to go negative for this variable, because beyond 2SD they would. So you have to keep this stuff in mind as well.

#### Moose

##### New Member
You would expect 68% to land between 50-150 and 99% 0-200, and is it realistic for a value to go negative for this variable, because beyond 2SD they would. So you have to keep this stuff in mind as well.
So can you have a negative SD? or just that the SD will push a number into negative

#### katxt

##### Well-Known Member
As hlsmith explained, SD is a measure of how spread out your input could be. An important factor in building a house is how many hours are likely to be lost due to rain over the period of the proposed build. You go back over the last 5 years and look at how much time would have been lost each year. You put them through your calculator and get mean 20 hours with a SD of 5. This means that there is about a 68% chance that the number of hours in any given year will be between 20 +/-5 or between 15 and 25. There is a 95% chance that the downtime will be between 20+/- 2x5 or between 10 and 30. With this data you can model the input as normal mean 20 and SD 5. In Excel you put =NORMINV(RAND(),20,5) and plausible inputs around 20 will be given, occasionally down to about 10 or up to 30.

#### Moose

##### New Member
Hey guys I am back!
I have been having a play an I have hit a hurdle that I don't know how to address.
That tool you provided @katxt it is great and I have started to build a formula but im not sure if im over thinking things or what...

For simplicity sake this is what I am trying to do... Perform a Triangle thing based on 2 variables...
So for example
Likelihood (range is 1-100) Low= 10 Likely = 30 High = 50
Consequence (range is 1-100) Low = 10 Likely 30 High = 50

Now if I run the program it will give me the mean as ~30 obviously and the formula is L + C / 2 = 30 (on average)

But for my purposes 10 + 50 / 2 = 30 is not the same as 50 + 10 / 20 = 30...

Is there a way I can figure out the number used for L and C? I would like to make a bar chart with the Axis's being Likelihood and Consequence?

Am I making sense or am I twisting myself up in knots?

P.S. The context for this is... A risk can be managed very differently depending on those 2 "variables" so if i was to put this in financial terms...
A Medium Risk (eg. value of 50 of 100) where a business loses $365 per year is managed differently to a Medium Risk where the business loses$1 per day. The outcome is the same but I would like to show that nuance.

Last edited:

#### katxt

##### Well-Known Member
I'm not too sure what you are trying to do.
It helps to do a fixed point estimate first. For example, if Likelihood is 40 and Consequence is 45, what calculation would you do with these numbers? What would the answer mean for 40 and 45? as opposed to, say, 20 and 45, or 40 and 25?

#### Moose

##### New Member
I'm not too sure what you are trying to do.
It helps to do a fixed point estimate first. For example, if Likelihood is 40 and Consequence is 45, what calculation would you do with these numbers? What would the answer mean for 40 and 45? as opposed to, say, 20 and 45, or 40 and 25?
So what I "think" I would like to do is represent on a graph...

Y axis - Likelihood (Time)
X axis - Consequences (Outcome)

So The risk rating data points on the X axis graph I would be assigned something like this....
First Aid - 0-10
FA - 11-20
FA - 21-30
Medical Treatment - 31-40
MT - 41-50
MT - 51-60
Disability - 61 - 70
Dis - 71 - 80
Fatality - 81 - 90
Fat - 91 - 100

So once the parameters are entered into the formula the graph would display its normal bell curve but the reader would be able to see that " Medical treatment is the most likely outcome at 62% and it is most likely to occur within 2 years... and then read the rest of the values going left and right from the apex of the bell curve... Is that making more sense?