statistical tolerance analysis basics: Root Sum Square (RSS)

In my last post on worst-case tolerance analysis I concluded with the fact that the worst-case method, although extremely safe, is also extremely expensive.

Allow me to elaborate, then offer a resolution in the form of statistical tolerance analysis.

cost

A worst-case tolerance analysis is great to make sure that your parts will always fit, but if you're producing millions of parts, ensuring each and every one works is expensive and, under most circumstances, impractical.

Consider these two scenarios.

1. You make a million parts, and it costs you $1.00 per part to make sure that every single one works. 2. You make a million parts, but decide to go with cheaper, less accurate parts. Now your cost is$0.99 per part, but 1,000 parts won't fit.

In the first, scenario, your cost is:

$1.00/part * 1,000,000 parts =$1,000,000

In the second scenario, your cost is:

$0.99/part * 1,000,000 parts =$990,000,

but you have to throw away the 1,000 rejects which cost $0.99/part. So your total cost is:$990,000+1,000*$0.99=$990,990. Which means you save $9,010. Those actual numbers are make-believe, but the lesson holds true: by producing less precise (read: crappier) parts and throwing some of them away, you save money. Sold yet? Good. Now let's take a look at the theory. statistical tolerance analysis: theory The first thing you'll want to think of is the bell curve. You may recall the bell curve being used to explain that some of your classmates were smart, some were dumb, but most were about average. The same principle holds true in tolerance analysis. The bell curve (only now it's called the "normal distribution") states that when you take a lot of measurements, be it of test scores or block thicknesses, some measurements will be low, some high, and most in the middle. Of course, "just about" and "most" doesn't help you get things done. Math does, and that's where the normal distribution (and excel... attachment below) come in. sidebar: Initially I planned on diving deep into the math of RSS, but Hileman does such a good job on the details, I'll stick with the broad strokes here. I highly suggest printing out his post and sitting down in a quiet room, it's the only way to digest the heavy stuff. the normal distribution and "defects per million" Using the normal distribution, you can determine how many defects (defined as parts that come in outside of allowable tolerances) will occur. The standard unit of measure is "defects per million", so we'll stick with that. There are two numbers you need to create a normal distribution, and they are represented by μ (pronounced "mew") and σ (pronounced ("sigma") • μ is the mean, a measure of the "center" of a distribution. • σ is the standard deviation, a measure of how spread out a distribution is. For example, the number sets {0,10} and {5,5} both have an average of 5, but the {0,10} set is spread out and thus has a higher standard deviation. Using one of our blocks (remember those?) as an example... Let's say you measure five blocks like the one above (in practice it's best to measure 30 at the very least, but we'll keep it at 5 for the example) and get the following results: • x1 = 1.001" • x2 = 0.995" • x3 = 1.000" • x4 = 1.001" • x5 = 1.003" The average (μ) is 1.000 ( and the standard deviation (σ) is .003. Plug those into a normal distribution, and your tolerances break down like this. (see the 'after production' tab in the attached excel file for formulas) If you require the blocks to be 1.000±.003 (±1σ), the blocks will pass inspection 68.27% of the time... 317,311 defects per million. If you require the blocks to be 1.000±.006 (±2σ), the blocks will pass inspection 95.45% of the time... 45,500 defects per million If you require the blocks to be 1.000±.009 (±3σ), the blocks will pass inspection 99.73% of the time...2,700 defects per million and so on. Using the data above you can say with confidence (assuming you measured enough blocks!) that if you were to use a million blocks, all but 2700 of them would come in between 0.991 and 1.009. root sum square and the standard deviation If you've followed the logic closely you may notice a catch-22. Ideally, you want to do a tolerance analysis before you go to production, but how can you determine μ or σ without having samples to test... which you will only get after production? You make (and state... repeatedly) assumptions The μ part is easy. You just assume that the mean will be equal to the nominal (in our case, 1.000). This is usually a solid assumption and only begins to get dicey when you talk about the nominal shifting (some like to plan for up to 1.5σ!) over the course of millions of cycles (perhaps due to tool wear), but that is another topic. For σ, a conservative estimate is that your tolerance can be held to a quality of ±3σ, meaning that a tolerance of ±.005 will yield you a σ of 0.005/3 = 0.00167. Let's play this out... If you are stacking five blocks @ 1.000±.005, you need to add up the five blocks to get μ, and take the square root of the sum of the squares of the standard deviation of the tolerances (wordy I know), which looks like this... SQRT([.005/3]^2+[.005/3]^2+[.005/3]^2+[.005/3]^2+[.005/3]^2)... (you divide by 3 because you are assuming that your tolerances represent 3 standard deviations) That's as wordy as I'm going to get on the math (the post is already longer than i'd like), you can see it working for yourself in the 'before production' tab in the attached excel file for formulas) Just remember to treat those numbers with the respect that they deserve and that industry-accepted assumptions are no replacement for a heart-to-heart (and email trail) with your manufacturer . Trying to push a manufacturer to hold tolerances they aren't comfortable with us a draining and often futile exercise. The tolerances dictate the design, not the other way around. update: My series of posts on worst-case, root sum square, and monte carlo tolerance analysis started off as just a brief introduction to the basics. Since then I have heard from a number of you asking for a clear, concise (everything else out there is so heavy), usable guide to both the math behind tolerance analysis and real-world examples of when to use it. I'm currently working on it, but would love to hear what YOU would like out of it. Let me know in the comments or contact me through the site. • http://www.hilemansblog.com Vince Hileman Hello Chris, You and I think alike. I would like to add your link to my blog. I also like the layout of your website. It seems I have a ways to go on mine. Vince Hileman http://www.hilemansblog.com • http://www.pdnotebook.com/ loughnane Vince, I'm flattered to hear you say we think alike... you've got some good, heavy stuff on your site. The website was actually surprisingly easy to get going. I used Thesis (http://www.shareasale.com/r.cfm?b=202503&u=4360...... affiliate link), which is pretty amazing. I wouldn't sweat the look too much though.. your content rocks. • http://www.hilemansblog.com Vince Hileman Hello Chris, You and I think alike. I would like to add your link to my blog. I also like the layout of your website. It seems I have a ways to go on mine. Vince Hileman http://www.hilemansblog.com • http://www.pdnotebook.com/ loughnane Vince, I'm flattered to hear you say we think alike... you've got some good, heavy stuff on your site. The website was actually surprisingly easy to get going. I used Thesis (http://www.shareasale.com/r.cfm?b=202503&u=436004&m=24570&urllink=&afftrack... affiliate link), which is pretty amazing. I wouldn't sweat the look too much though.. your content rocks. • Pingback: hard anodizing | Product Design Notebook() • Bobmalcomb Great job Chris ! I used to know how to do this but I forgot and needed it again !!!! • http://www.pdnotebook.com/ loughnane Glad you enjoyed it (maybe 'enjoyed' is too strong Sure beats designing everything for a worst-case scenario. • Hardison Gary1 Thanks Chris! I was searching for this information and ran across your post; it was exactly what I was trying to find! • http://www.pdnotebook.com/ loughnane It's always refreshing to be reminded I'm not the only nerd who cares about this stuff. If you've got any questions feel free to toss them up here. • Sadaslm Thanks a lot Chris..!! I was googling about RSS and lucky end up here..!!The content is lucid, precise and so clear for a beginner like me on this topic. Thanks again.. Keep going!! • http://www.pdnotebook.com/ loughnane if i could strive for only three adjectives, I could do much worse than lucid, precise and clear. Thanks! • Anup I don't quite understand the formula, (NORMDIST(x,mu,sig,true)-0.5)*2. Why the 0.5 and 2? • http://www.pdnotebook.com/ loughnane the 'true' parameter of NORMDIST results in the function returning the probability that a value will be less than or equal to 'x' (if you were to do 'false', it would just return the probability that the specific value would occur). So once you have evaluated NORMDIST(x,mu,sig,true), you have the area under the curve of the normal distribution from negative infiniti to x. Since the total area under the curve is 1 (corresponding to a total probability of 100%), subtracting by 0.5 effectively removes the area on the left side of the distribution. THen multiplying by 2 doubles that probability, which would graphically look like you were mirroring the area you had after subtracting by 0.5 across mu (the centerline of the distribution) • Anup Pendeyal Thank you Chris, much appreciated! I was thinking along the same lines but had never come accross such an idea, so I had to be sure! • http://www.pdnotebook.com/ loughnane glad to help. Excel is an imperfect medium, but can usually be hacked for good results • Porgynbess A general comment about costing . . . you stated, "Those actual numbers are make-believe, but the lesson holds true: by producing less precise (read: crappier) parts and throwing some of them away, you save money." Here are some indirect costs that need to be accounted when producing a high rate of rejected parts: 1) resource to identify & dispose rejected part. of course, this could be an automated process and/or use of electronic devices, but nevertheless it's another step in the process. 2) the cost of a down equipment if the rejected part is installed in an assembly. 3) lost time b/c of the above. 4) dissatisfied customer. -milesursogood • http://www.pdnotebook.com/ loughnane Spoken like someone who lives it. Reject rate is definitely just a small part of the financial picture. • http://www.kevindesmet.com Kevin De Smet I don't understand the section where you discuss the blocks, the tolerance is relaxed but the pass rate goes up, or am I reading this wrong? • http://www.kevindesmet.com Kevin De Smet I don't understand the section where you discuss the blocks, the tolerance is relaxed but the pass rate goes up, or am I reading this wrong? • http://www.pdnotebook.com/ loughnane The idea is that if you have a dimension on a drawing, the manufacturer will try to hit it. The variation in the parts that the manufacturer produces is defined by the quality of their process. So if you have a larger tolerance range on the drawing it means that more of the parts that get made will fall within that your design tolerance • Aaron Chen Thanks a lot, Chris. It is very helpful for me to understand the statistic tolerance methods...And it is pretty clear • Aaron Chen Only question is: what you mean by "$" in "NORMINV(RAND(),$B3,$C3/3)"

• Aaron Chen

I know it now. Thank you, anyway.

• Aggiemba

What software is available for tolerance analysis?  We use Wildfire 5.

• Divergence

Great text, thanks a lot!
Just one extra hint:
In "root sum square and the standard deviation" -part, you can skip dividing by 3 inside the square root and use the actual tolerances. Then you can divide the result by three and you get the standard deviation.
If you write the formula on paper and try to remember high school algebra you will see it's true.

• http://www.pdnotebook.com/ loughnane

You're absolutely right. I copied that equation from some code I wrote. In the code I have the option to vary the quality of each dimension (so instead of 3 sigma, it might be 4 sigma, 5 sigma, or 6 sigma). This helps debugging in production, once you have measured quality levels for critical-to-function dimensions.

The method you suggest would be more concise, and for early stage tolerance analysis is probably preferred.

• XYZ

Hi,
It is pretty good article and easy way to understand it.  I would appreciate you to explain me if it is possible to extend the bell curve to 6 sigma? Why do you show this bell curve up to 3 sigma only?
The tolerances at my work are even higher than 3 sigma but nobody knew how to explain me why.

Excellent explanation mate.

Cheers

• http://www.pdnotebook.com/ loughnane

So it wouldn't technically be "extending" the bell curve to six sigma, but I think I get what you're saying.

So the bell curve is created by two parameters: the mean and the
standard deviation (a.k.a. sigma). The mean is what it is, so the only
thing that we can change is sigma.

In the RSS method, once you've calculated the square Root of the Sum of
the Squares you divide it by n to get your sigma. If n=3, sigma will be
larger (have a larger standard deviation... a wider bell curve) than if
n=6.

So let's say our mean is zero and our design indicates that the maximum
acceptable upper limit is 0.008 (and let's assume our minimum acceptable
lower limit is -0.010, just to make things symmetrical). If our RSS
value is 0.012, a level of quality equating to 3sigma would result in a
sigma of 0.004, whereas a level of quality equating to 6sigma would
result in a sigma of 0.002

Back to our limit of 0.008: under 3sigma circumstances, that upper limit
is 2 standard deviations from the mean. Under 6sigma circumstances,
that upper limit is 4 standard deviations from the mean.

If you look at the image at the top of this post you'll see that +/- 2
standard deviations from the mean encapsulates 95.4% of all cases (i.e. a
4.6% failure rate) whereas  +/- 4 standard deviations from the mean
(the image above doesn't go far enough... but trust me) encapsulates
99.9936% of all cases (i.e. a 0.0064% failure rate) .

I typically assume 3sigma when I'm designing something, even though 4sigma is more realistic, just to be conservative.

• XYZ

You couldn't explain it better mate.  My respects.

Cheers

• Chris R

Chris,
I love your post.  As you know I work in dimensional engineering.  Tolerance analysis including simple stacks is something I have employed for years.  What I like is your straightforward write up.  You have made the RSS tolerance stack topic clearer than most dimensional engineers I know.  Cudos to you.

• Amit Pandita

Chris,

Thanks for the easy explanation!!!
I have a  question. If we have GD&T (like position tolerance) in the stack, how can we use their tolerances in RSS stacks?

• http://www.pdnotebook.com/ loughnane

Many thanks Chris.

Also, you have not been forgotten; if you would like to come in to Farm sometime and discuss your capabilities that is certainly something we're open to.

• Sam

Do you have a page on Cp, Cpk & Cpm?
I have a specific question regarding them. Should Cpm always be smaller than Cpk unless the target is equal to the mean of a dimension?

• Esteban

• Steve Pero

Chris, nice one! I just googled this for a co-worker and your post was the best, so I shared it with him, then saw it was yours!
Hope all is well, we hope to be moving back to NH in the next year or so...

• http://www.pdnotebook.com/ loughnane

Things are going well... actually just finished an all-day TA review, so your comment is timely.

How come you're heading back to NH so soon? Shoot me an email... loughnane.c@gmail.com

• Grey Starr

Most excellent! Thank you. Of interest, I had to change the math in your excel sample for RSS in order to use it in my own work... =SQRT(SUMSQ(C2:C6)) ... The SUMSQ function looks to be new in excel 2010.

• http://www.pdnotebook.com/ loughnane

Ha. I remember when I discovered SUMSQ I thought "and here I've been doing it manually all these years!". Good to know it wasn't me.

• Sharath Raju

I guess I see the trees but not the forest!

Meaning: I see that you take the root sum square of the (estimated) standard deviation and the numerical value is equal to 0.003734. But how does this help to resolve the catch 22 situation described earlier?

Thanks!
Sharath Raju

• Sai

Hi, while I was going through the attached excel sheet, i have noticed you have used following formula in "before worksheet"

=(NORMDIST(E9+B7,B7,B8,TRUE)-0.5)*2

I'm clear on first part, (i.e (NORMDIST(E9+B7,B7,B8,TRUE) ) where you r finding Z score , ...I'm not clear on other part (i.e -0.5)*2)..could u please elaborate it more....

Also, how did we arrive @ value 3.4 defects/ million when we have 6 std deviations in between Mean & USL in Normal distribution table...

• http://www.pdnotebook.com/ loughnane

Regarding the "-0.5)*2)", it's a function of being limited by Excel's functionality. You can read the docs for the NORMDIST function for a better explanation, but in short the TRUE flag being set tells the function to calculate the cumulative probability from negative infiniti to "X" (which, the way you have written it, would be E9).

In order to get a symmetrical distribution (ie +/- 1sigma, 2sigma, etc.) you cut off everything from -infiniti to our mean (that's the minus 0.5) and then take what's left (ie from mean to X) and mirror it about the mean (the *2) to get your distribution.

Regarding the 3.4, I don't follow you. It's been a while since I wrote this so maybe I'm missing something, I would expect a +/- 6sigma normal distribution to yield 0.001973 Defects Per Million (note this is just a pure distribution and does not account for a 1.5sigma shift... common practice in many domains).

• Kristy

Chris- excellent post, and I have a quick question for you. Why did you choose 3 sigma in the example? Is that just essentially an industry standard? And what does sigma mean in a statistical sense exactly? If I interpreted your post correctly, 3 sigma basically means that 99.7% of parts will come within the specified tolerance for that part. So if you're not as confident that your parts won't come in within the specified tolerance, then you would decrease the sigma? Just trying to figure this stuff out! Thanks!

• Andy Southern

After you remake the replacement 1,000 parts, you'll have an extra one to throw away, so you need to remake 1,001.......

• http://www.pdnotebook.com/ loughnane

Sorry for the delay. There was a hiccup in my alerts and I'm just seeing some of these now.

3sigma is an industry standard. It correlates to a process capability (Cp) of 1.0. Anything worse than that is an incapable process.

What it means statistically is that the stated tolerance range (e.g. +/- 0.005) represents +/-3 standard deviations (i.e. sigma [ 0.00167). This is the number that you are telling your supplier to hit.

If you are not confident that the supplier can hit a standard deviation of 0.00167 then you would want to increase your tolerance range. For example, if you know, from previous experience, that your supplier can only provide a standard deviation as low as 0.003, you would want to change your drawing tolerance to +/-0.009.

Increasing the tolerance will make more of the parts "accepted" but it means that you now have a tolerance.

What it all boils down to is that your design should be tolerant enough to accommodate the variation from your suppliers, and your drawing should reflect that.