Hear me out.

The title is obvious clickbait, but there is a point that I want to make. When I am hanging from a device, I am not as concerned about its breaking strength as I am about its breaking probability. I want to know how likely it is that it will support my weight. I want that probability to be 100% but there is no chance whatsoever that it is. Unfortunately, I can never know what the breaking probability is until I get off rope, when I know the answer was either 100% or 0% - but that is the a posteriori probability, not the a priori probability that I wanted to know before I got on rope. Nobody has figured out how to measure the latter, so we resort to strength testing.

I see a lot of single breaking strength tests. What does that tell me? Only that something broke at a particular load. That load is (usually) far more than the load I am going to apply. The number is completely useless. It tells me nothing about the breaking probability of a different sample when supporting my weight. We need more tests to get a handle on that.

How many breaking tests do we need to get an idea of the variation? I heard one very prominent Ph.D. caver engineer instantly answer that question with "21." That is not a bad answer, but there are a lot of assumptions behind it that may or may not apply. Despite this, it is a good start. Without question, "2" or "3" are not enough to tell me much. There are sophisticated methods that allow us to estimate how many samples we need to break to get meaningful results. Anyone who is serious about obtaining useful results knows them and uses them. If we do not, our results may be meaningless and/or misleading.

Suppose we break at least enough random samples and look at the results. We can average these to get an estimate of the mean breaking strength (the sample mean), but that is not what we want. We can calculate something called the "sample standard deviation." That tells us how much spread there is to our measurements. Both numbers are random. We can do more sophisticated analysis to estimate how confident we are that the actual mean is greater than a particular number and that the actual standard deviation is less than a particular number. There are nice formulas based on the Gaussian (a.k.a. Normal) probability distribution – the nice "bell-shaped curve." Everyone assumes that this is a valid distribution. For strength data, it is absolutely, positively NOT correct. It might be useful, but only if we are careful.

Have you heard statements like "so many percent of the data falls withing so many standard deviations of the mean?" Most of the time, these statements assume that the data came from a Normal distribution. They are incorrect if the underlying distribution is not the normal distribution. It never is. People calculate using the Normal distribution because the mathematical formulae are simple. Real-world distributions frequently have larger tails than the Normal distribution. If we use the Normal distribution, we will probably under-estimate the probability of failure. This is especially true if we extrapolate and estimate the probability of failure at a load that is less than the lowest breaking load that we measured.

Any mathematical model used on physical situations is at best an approximation. Statisticians credit George Box with the famous statement "All models are wrong, but some are useful." I particularly like his warning[1], "All models are approximations. Assumptions, whether implied or clearly stated, are never exactly true. All models are wrong, but some models are useful. So the question you need to ask is not "Is the model true?" (it never is), but "Is the model good enough for this particular application?" Whether you do the math or not, your life may depend on the answer.

[1] Box, G. E. P.; Luceño, A.; del Carmen Paniagua-Quiñones, M. (2009). Statistical Control By Monitoring and Adjustment. John Wiley & Sons. p. 61.