Statistical analysis forms the backbone of the Analyze phase in Lean Six Sigma methodology. Whether you are embarking on your first process improvement project or looking to enhance your understanding of data-driven decision-making, mastering the terminology used during statistical analysis is essential. This comprehensive guide will walk you through the most important terms, complete with practical examples and sample datasets that bring these concepts to life.
Understanding the Importance of Statistical Terminology in the Analyze Phase
The Analyze phase represents a critical juncture in any Lean Six Sigma project. This is where teams transition from collecting data to extracting meaningful insights that drive process improvements. Without a solid understanding of statistical terminology, practitioners may struggle to communicate findings effectively, misinterpret results, or select inappropriate analytical tools. You might also enjoy reading about Minitab for Analyze Phase: Key Statistical Tests and How to Run Them in Lean Six Sigma.
Consider a manufacturing scenario where a team is investigating defect rates in a production line. The difference between understanding terms like “variance,” “standard deviation,” and “capability indices” can mean the difference between implementing effective solutions and wasting resources on incorrect assumptions. Let us explore these terms systematically. You might also enjoy reading about Failure Mode and Effects Analysis: A Strategic Approach to Prioritizing Potential Problems.
Fundamental Statistical Measures
Mean (Average)
The mean represents the arithmetic average of a dataset, calculated by summing all values and dividing by the number of observations. This measure provides a central reference point for understanding data distribution. You might also enjoy reading about Analyze Phase Tollgate Review: Key Questions Champions Will Ask in Your Lean Six Sigma Project.
Example: A customer service team measures call handling times over ten calls: 5, 7, 6, 8, 5, 9, 7, 6, 8, and 9 minutes. The mean is calculated as (5+7+6+8+5+9+7+6+8+9) / 10 = 7 minutes. This tells the team that, on average, calls take seven minutes to handle.
Median
The median identifies the middle value in a dataset when arranged in ascending or descending order. Unlike the mean, the median remains resistant to extreme values, making it particularly useful when dealing with skewed distributions.
Example: Using the same call handling dataset (5, 5, 6, 6, 7, 7, 8, 8, 9, 9), the median falls between the fifth and sixth values. Since both are 7, the median equals 7 minutes. However, if one outlier call lasted 45 minutes, the dataset becomes (5, 5, 6, 6, 7, 7, 8, 8, 9, 45), where the mean jumps to 10.6 minutes, but the median remains at 7 minutes, providing a more representative measure of typical performance.
Mode
The mode represents the most frequently occurring value in a dataset. Some datasets may have multiple modes (bimodal or multimodal) or no mode at all.
Example: In a quality inspection recording defect counts per batch (2, 3, 3, 4, 3, 5, 6, 3, 7), the mode is 3, appearing four times. This information might indicate a recurring issue at a specific defect threshold.
Measures of Variability and Dispersion
Range
The range represents the simplest measure of spread, calculated as the difference between the maximum and minimum values in a dataset. While easy to compute, the range only considers two data points and remains highly sensitive to outliers.
Example: A restaurant tracks daily customer counts over a week: 45, 52, 48, 50, 53, 125, 49. The range equals 125 minus 45, resulting in 80 customers. The large range immediately signals an unusual day (possibly a special event) that warrants investigation.
Variance
Variance measures how far individual data points deviate from the mean, calculated by averaging the squared differences from the mean. This measure provides crucial information about data consistency and process stability.
Example: Two production lines manufacture widgets with target weight of 100 grams. Line A produces items weighing 98, 99, 100, 101, 102 grams (variance = 2.5). Line B produces items weighing 90, 95, 100, 105, 110 grams (variance = 62.5). Despite both having a mean of 100 grams, Line A demonstrates much greater consistency, as reflected in its lower variance.
Standard Deviation
Standard deviation represents the square root of variance, expressing variability in the same units as the original data. This makes it more interpretable than variance and widely used in quality control applications.
Example: Building on the previous example, Line A has a standard deviation of 1.58 grams, while Line B has a standard deviation of 7.91 grams. Quality engineers can immediately recognize that Line B exhibits approximately five times more variability than Line A, signaling potential process control issues requiring investigation.
Probability and Distribution Concepts
Normal Distribution
The normal distribution, also known as the Gaussian distribution or bell curve, describes data that clusters symmetrically around the mean. Many natural phenomena and manufacturing processes follow this pattern, making it fundamental to statistical analysis.
Example: Heights of adult males in a population typically follow a normal distribution. If the mean height is 175 cm with a standard deviation of 7 cm, approximately 68% of males will have heights between 168 and 182 cm (within one standard deviation), 95% between 161 and 189 cm (within two standard deviations), and 99.7% between 154 and 196 cm (within three standard deviations).
Standard Normal Distribution (Z-distribution)
The standard normal distribution represents a special case where the mean equals zero and the standard deviation equals one. Converting data to this standardized form enables comparisons across different measurement scales.
Example: A student scores 85 on a test with a mean of 75 and standard deviation of 10. The Z-score equals (85 minus 75) / 10 = 1.0, indicating performance one standard deviation above average. This standardization allows comparing performance across different subjects with different scoring systems.
Confidence Interval
A confidence interval provides a range of values within which the true population parameter likely falls, expressed with a specified confidence level (commonly 95% or 99%).
Example: A manufacturing team samples 50 products and finds a mean defect rate of 3.2% with a standard deviation of 0.8%. Calculating a 95% confidence interval might yield a range of 2.97% to 3.43%. This means the team can be 95% confident that the true defect rate for all products falls within this range, informing decisions about whether the process meets quality standards.
Hypothesis Testing Terminology
Null Hypothesis (H0)
The null hypothesis represents the default assumption that no relationship, difference, or effect exists between variables or groups being studied. Statistical tests aim to gather evidence either supporting or rejecting this hypothesis.
Example: A company implements new training for customer service representatives. The null hypothesis states: “The new training has no effect on average customer satisfaction scores.” The team collects data to test whether evidence supports rejecting this assumption.
Alternative Hypothesis (H1 or Ha)
The alternative hypothesis proposes what researchers actually expect or want to prove, representing the opposite of the null hypothesis.
Example: Continuing the training scenario, the alternative hypothesis states: “The new training increases average customer satisfaction scores.” If statistical analysis provides sufficient evidence, the team may reject the null hypothesis in favor of this alternative.
P-value
The p-value quantifies the probability of obtaining results at least as extreme as those observed, assuming the null hypothesis is true. Smaller p-values provide stronger evidence against the null hypothesis.
Example: After implementing the training program, statistical analysis yields a p-value of 0.03. Using a common significance level of 0.05, this result indicates only a 3% probability that the observed improvement occurred by chance alone. Therefore, the team has reasonable grounds to conclude that the training effectively improved customer satisfaction.
Type I and Type II Errors
Type I error (false positive) occurs when rejecting a true null hypothesis, while Type II error (false negative) occurs when failing to reject a false null hypothesis. Understanding these errors helps practitioners balance risk and confidence in decision-making.
Example: A pharmaceutical company tests a new drug. A Type I error would conclude the drug is effective when it actually provides no benefit, potentially exposing patients to unnecessary risks. A Type II error would conclude the drug is ineffective when it actually works, depriving patients of beneficial treatment. The company must carefully set significance levels to minimize both error types appropriately.
Correlation and Regression Analysis
Correlation Coefficient
The correlation coefficient measures the strength and direction of the linear relationship between two variables, ranging from negative 1 to positive 1. Values near zero indicate weak relationships, while values approaching the extremes indicate strong relationships.
Example: A retail chain analyzes the relationship between store size (square meters) and monthly sales. Data from 20 stores yields a correlation coefficient of 0.78, indicating a strong positive relationship. Larger stores tend to generate higher sales, though the relationship is not perfect since other factors (location, demographics) also influence performance.
Regression Analysis
Regression analysis develops mathematical equations describing relationships between independent variables (predictors) and dependent variables (outcomes), enabling predictions and understanding of causal relationships.
Example: The retail chain performs regression analysis using the same data, producing the equation: Monthly Sales = 15,000 + (180 times Store Size). This equation predicts that a 200-square-meter store should generate approximately 51,000 in monthly sales (15,000 + 180 times 200). Managers can use this model for planning new store locations and size optimization.
R-squared (Coefficient of Determination)
R-squared indicates the proportion of variance in the dependent variable explained by the independent variables in a regression model, ranging from 0 to 1. Higher values suggest better model fit.
Example: The retail regression model yields an R-squared value of 0.61, meaning store size explains 61% of the variation in monthly sales. The remaining 39% results from other factors not included in this simple model, such as local competition, parking availability, or demographic characteristics.
Process Capability Terminology
Control Limits
Control limits define the boundaries of expected process variation based on actual process performance. Data points falling outside these limits suggest special cause variation requiring investigation.
Example: A bottling company monitors fill volumes using control charts with upper and lower control limits of 502 ml and 498 ml respectively (target: 500 ml). When a bottle measures 504 ml, it falls outside the upper control limit, triggering an investigation that reveals a valve calibration issue.
Specification Limits
Specification limits represent customer requirements or engineering tolerances that define acceptable product or service characteristics. Unlike control limits based on process performance, specification limits reflect external requirements.
Example: The bottling company’s customer contract requires fill volumes between 495 ml and 505 ml. These specification limits are wider than the control limits, indicating the process operates well within customer requirements. However, if specification limits were narrower (497 to 503 ml), the process would frequently produce out-of-specification products despite remaining in statistical control.
Process Capability Indices (Cp and Cpk)
Process capability indices quantify how well a process can meet specifications. Cp measures potential capability assuming perfect centering, while Cpk accounts for actual process centering.
Example: The bottling process has a mean of 500 ml, standard deviation of 1.2 ml, and specification limits of 495 to 505 ml. Cp equals (505 minus 495) / (6 times 1.2) = 1.39, indicating good potential capability. Cpk equals the minimum of [(500 minus 495) / (3 times 1.2), (505 minus 500) / (3 times 1.2)] = 1.39. Since Cp equals Cpk, the process is perfectly centered. Values above 1.33 generally indicate capable processes.
Analysis of Variance (ANOVA) Concepts
ANOVA (Analysis of Variance)
ANOVA is a statistical technique comparing means across three or more groups to determine whether significant differences exist. This method extends the capabilities of t-tests to multiple group comparisons.
Example: A hospital compares patient recovery times across four different treatment protocols with sample data: Protocol A (5, 6, 7, 6, 5 days), Protocol B (7, 8, 9, 8, 8 days), Protocol C (6, 7, 6, 7, 6 days), and Protocol D (4, 5, 4, 5, 4 days). ANOVA helps determine whether the observed differences represent true treatment effects or merely random variation.
F-statistic
The F-statistic represents the ratio of variance between groups to variance within groups in ANOVA. Larger F-values suggest greater differences between group means relative to variation within groups.
Example: The hospital’s ANOVA produces an F-statistic of 12.8 with a corresponding p-value of 0.001. This large F-value and small p-value provide strong evidence that treatment protocols produce genuinely different recovery times, justifying further investigation into which specific protocols differ.
Non-parametric Test Terminology
Non-parametric Tests
Non-parametric tests analyze data without assuming specific distribution patterns, making them valuable when working with small samples, ordinal data, or non-normal distributions.
Example: A restaurant collects customer satisfaction ratings on a scale of 1 to 5 stars before and after menu changes. Since this ordinal data does not follow a normal distribution and involves the same customers rating twice, the Wilcoxon signed-rank test (a non-parametric alternative to the paired t-test) provides appropriate analysis.
Chi-square Test
The chi-square test examines relationships between categorical variables by comparing observed frequencies with expected frequencies under the assumption of independence.
Example: A marketing team investigates whether customer purchase behavior (yes/no) relates to advertising channel (email, social media, direct mail). Collecting data from 300 customers across three channels, the chi-square test reveals whether certain channels produce significantly different purchase rates or whether observed differences merely reflect random chance.
Practical Application and Integration
Understanding these statistical terms in isolation provides limited value. The true power emerges when practitioners integrate multiple concepts to solve complex business problems. Consider a comprehensive example involving a call center seeking to improve first-call resolution rates.
The team begins by calculating descriptive statistics (mean, median, standard deviation) for current resolution rates across different times of day and days of the week. They discover that the mean resolution rate is 78% with a standard deviation of 12%, but the median is 82%, suggesting negative skew caused by particularly poor performance during specific periods.
Using correlation analysis, they identify that resolution rates correlate negatively with call volume (r equals negative 0.65) and positively with agent experience level (r equals 0.71). Regression modeling quantifies these relationships, enabling predictions about how staffing changes might impact performance.
ANOVA reveals significant differences in resolution rates across different product categories. Follow-up analysis shows that technical support calls have significantly lower resolution rates than billing inquiries. The team hypothesizes that enhanced technical training will improve overall performance.
After implementing training, hypothesis testing compares before and after performance. The null hypothesis states that training has no effect. Analysis yields a p-value of 0.008, providing strong evidence to reject the null hypothesis and conclude that training effectively improved resolution rates.
Finally, process capability analysis confirms that the improved process now operates within acceptable specification limits (minimum 85% resolution








