Engineering Statistical Glossary
Source: NIST/SEMATECH e-Handbook of Statistical Methods
NOTE: This glossary is being adapted from a glossary kindly donated
by Bill Heavlin of Advanced Micro Devices.
Welcome to the hypertext dictionary. It contains selected terms
from engineering statistics. Current topic areas include
experimental design,
metrology,
survey questionnaires,
statistical process control, and
computer experiments.
[A]
|
accuracy
|
In metrology, the total
measurement variation,
including not only precision
(reproducibility), but also the
systematic offset between the average of measured values and the
true value.
|
|
additive effect
|
A property of a model describing a physical
process whereby the average or expected change
from changing a particular input factor
does not depend upon the values of other input
factors. An additive
effect has no associated
interactions.
|
|
affine calibration
|
Especially in computer experiments,
the practice of improving the agreement between predicted values and
empirical responses
by modifying one or more of the input factors
(usually of the simulator) by linear transformations. To be
distinguished from global calibration.
|
|
analysis of manufacturing variance (AMV)
|
Especially in the analysis of computer
experiments, the decomposition of projected distribution of a
process in manufacturing into components attributable to each of
several factors, or combination of factor
interactions.
To be distinguished from analysis of variance,
because AMV depends on assuming a (usually
Gaussian) distribution for each
factor. To be distinguished from variance
components, because AMV is usually applied to
computer experiments.
|
|
analysis of variance
|
A way of presenting the calculations for the significance of a
particular factor's
effect, especially for data in which the
influence of several factors is being
considered simultaneously. Analysis of variance decomposes the sum
of squared residuals from the mean into
non-negative components attributable to each factor, or combination
of factor interactions.
Usually it is useful to distinguish between fixed and random
effects. In the case of only random effects,
the term variance components is
often preferred.
|
|
assignable cause
|
A synonym for special cause.
|
|
attenuation
|
- As a fuzzy concept, the lessening
of any signal due to the presence of
variation.
- In metrology, the tendency to estimate
the sensitivity to a signal
with a bias toward zero, especially due to
the effect of variability or
measurement uncertainty in the
calibration standard.
|
|
audit
|
The periodic observation of performed activities to verify
compliance with documented requirements.
|
|
average outgoing quality
|
The average level of defective product that is delivered to the
customer, after any benefit from inspection and rectification has
been taken into account. Usually reported in part per million. See
also
Hahn estimator.
Return to the Top of the Page
|
|
[B]
|
bar chart
|
A graph that reports several values by drawing a bar from zero to
each value. Each bar is suitably labeled. Critics of good graphic
design distinguish further between vertical bar charts and horizontal
bar charts. Vertical bar charts, sometimes also called "column
charts," plot the values against the y-axis and the labels along the
x-axis; they are recommended for reporting data in time order.
Horizontal bar charts plot the values along the x-axis and the labels
along the y-axis; they are recommended for reporting data that is not
chronological. Note that for horizontal bar charts, the labels are
naturally oriented horizontally, and more readable than for vertical
bar charts. Finally, note that recommended practice is to order the
axes meaningfully, either using a natural sequence in the labels,
or by ranking the values plotted. See also
Pareto charts.
|
|
bias
|
The difference between the average or expected value of a
distribution, and the true value. In
metrology, the difference
between precision and
accuracy is that measures of
precision are not affected by bias, whereas
accuracy measures degrade as bias increases.
|
|
binomial
distribution
|
An important theoretical distribution
used to model discrete events, especially the
count of defectives. The binomial distribution depends
on two parameters, n and p. n is the total
number of trials; for each trial, the chance of observing the event
of interest is p, and of not observing it, 1-p. The
binomial distribution assumes each trial's outcome is independent of
that of any other trial, and models the sum of
events observed. Unlike the Poisson
distribution, the binomial distribution sets a maximum number of
events n, the sample size
that can be observed. Unlike the
hyergeometric distribution,
the binomial distribution assumes the events it counts are
independent.
|
|
blocking
|
The practice of partitioning an experiment
into subgroups, each of which is restricted in size, time, and/or
space. Good experimental design practice has all
factors changing within blocks, unit assignment
within blocks randomized, and block
order and assignment randomized.
|
|
Box-Benkhen experiment
|
An experimental design with three
levels on each
factor
that allows the estimation of a full quadratic
model, including
interactions. Box-Benkhen designs have
two parts:
- centerpoints, and
- points lying on one sphere, equally distant from the
centerpoint.
The latter points consists of small
two-level full factorials where some
factors are fixed at their center values. The number of
centerpoints is chosen to
establish rotatability.
Compare to central composite designs.
|
|
box plot
|
A univariate graphical display of a distribution designed to
facilitate the comparison of several groups, especially when each
group has a substantial number of observations. Each group is
represented by a box; the ends of the box denote the 25th and 75th
percentiles; a mid-line denotes the median. In addition, from the
ends of the box outward are two lines drawn to either (a) the largest
and smallest values of the distribution, or (b) the largest and
smallest values that are not considered
outliers. By the latter convention,
individual values that are considered outliers
are plotted as particular points. Some software plots the average
value also.
|
|
brushing
|
The technique of highlighting a subgroup of observations,
especially in a scatterplot matrix,
but sometimes also in a histogram or other
graphical display. In a scatterplot
matrix, brushing helps is visualizing multivariate data. Typical
computer implementations allow the user to redefine the subgroup in
real time.
Return to the Top of the Page
|
|
[C]
|
calibration
|
In metrology, the process or method for comparing actual readings
to their known values, and also of making suitable adjustments so that
the agreement between the two is improved.
|
|
constraint
|
In either an experiment or for a production
process, a limitation in the
range of a factor or combination of factors that is either physically not possible or
greatly undesirable to execute.
|
|
capability
|
The natural variation of a
process due to common
causes.
|
|
capability index, Cpk
|
A measure of the natural variation of a
stable process
compared to the closeness of the specification limit(s). When
the process is both
stable and normally
distributed, it is possible to estimate from Cpk the fraction of
product out of specification.
Let LSL denote the lower specification limit and let USL
denote the upper specification limit. Let AVG denote the mean
or similar typical value of a distribution, and let SIGMA
denote an estimate of the total common cause
variation. Then
Cpk is defined as the smaller of [ AVG - LSL ]/3*SIGMA
and [ USL - AVG ]/3*SIGMA.
Sometimes only a lower or only an upper specification is
appropriate. For a lower limit, the one-sided capability index
called Cpl, defined as [ AVG - LSL ]/3*SIGMA, can used instead;
for an upper limit, Cpu, defined as [ USL - AVG ]/3*SIGMA.
Because of their similarity, Cpk is sometimes used as a general
term to include the cases of both one- and two-sided
specifications.
|
|
capability index, Cpk with 25 percent precision
and 95 percent confidence
|
If the process is repeated with at least 33
distinct repetitions, Cp, defined as [USL-LSL]/6*SIGMA, has a 95
percent confidence interval that is about plus or minus 25
percent of the estimated Cp value. This same principle holds
approximately for the confidence interval of Cpk.
Further detail is available in AMD technical report 320.
Alternative methods are available to achieve comparable
precision with smaller sample sizes under some circumstances.
See the AMD technical report 326.
|
|
capability study
|
Any study of the common cause variability of a
process.
|
|
capable process
|
- a process in which there is sufficient tolerance in the
specification range that, in principle, one can detect
out-of-control situations and effect corrective action
without placing production material in jeopardy
- A process for which the capability
index Cpk exceeds 1.0. (Other criteria
for Cpk are sometimes promoted.
Among these are 1.33, 1.5, and 2.0, but these latter values
are usually reserved for the label "manufacturable."
|
|
cause-effect diagram
|
Also called a CE diagram, an Ishikawa diagram, and a fish-bone
diagram. First presented by Kaoru Ishikawa, a picture describing
the various causes and sources of variation
on a particular quality
of interest. The quality of interest is usually placed at the right,
at the tip of a horizontal arrow. Major categories of causes branch
off this main arrow in a manner reminiscent of bones of a splayed
fish. Other coding conventions draw boxes around cause labels when
the influence of a cause is quantified, and underline labels when
such causes are believed to be important, but when the
effect is not yet quantified.
|
|
census
|
The method of data collection that involves assessing all the
units in the sample frame, i.e.
population. As opposed to a
sample.
|
|
centerpoint
|
In an experiment with quantitative factors, the experimental
condition corresponding to all factors being
set to the mid-point between their high and low values. Centerpoints
serve to test for the presence of curvature, and give information
about quadratic effects. When repeated,
centerpoints also provide estimates of the magnitude of the
experimental error.
|
|
central composite design
|
Also known as a Box-Wilson or star composite design. An
experimental design of three parts:
- A two-level full or fractional
factorial design;
- "Star" or axial points in which each
factor is varied to high and low
levels with all other factors
held constant;
- centerpoints.
The configuration of star points leads to variations: If one codes
the two-level design part with -1 and
+1, then the original Box-Wilson proposal varied the star points at
particular values larger than 1; the precise value was chosen to
ensure rotatability. Another
alternative is to restrict the star points to +/-1.
Central composite designs have a further appeal in that they are
amenable to iterative experiments and blocking.
Compare to Box-Benkhen designs.
|
|
characteristic
|
A distinguishing feature of a process or
its output on which variables or attributes data can be collected.
The response of a
process.
|
|
characterization
|
Any description of a process or its
measurable output that aids in the prediction of its performance.
|
|
checklist
|
A method of data recording, or of data analysis, in which the
scale of the measurement is broken into distinct lines. On observing
a value that falls in a particular interval, one records a vertical
stroke. Each fifth stroke is drawn horizontally across the preceding
four.
|
|
Clifford's method
|
The robust calculation of
control limits for the
individuals chart. The
centerline is calculated by the median, and the upper (lower)
control limits are at the centerline
plus (minus) 3.15 times the median moving range. Clifford originally
proposed this method to reduce hand calculation; it has good
properties for automated control limit recalculation also.
|
|
close-ended question
|
In a survey, a question format that poses
a question, and attempts to structure the answer (by yes-no, scale
from 1 to 10, etc.)
|
|
common cause
|
A source of natural variation that affects
all of the individual values of the process
output being studied. Typically, common causes are numerous,
individually contribute little to the total
variation (although the total
variation can still be substantial), and are
difficult to eliminate.
|
|
computer experiment
|
A study of a fundamental physical process by
the use of one or more computer simulators. Like empirical
experiments, input variables
(factors) are systematically changed to assess
their impact upon simulator outputs
(responses). Unlike empirical
experiments, the simulator
responses are
deterministic, and this has
implications: Computer experiments can appropriately have their
factors with intermediate
levels and the scope, especially the
number of runs, can be more ambitious. Further,
modeling methods based on
interpolators (especially
kriging) emerge as a viable approach. Good
practice is to use Latin hypercubes for computer
experiments, and advanced nonparametric
modeling methods such as
kriging, neural
networks, and multivariate adaptive regression splines (MARS) in
the data analysis stage. Important applications of computer
experimental methods are for determining
process optima and for evaluating
process tolerances.
|
|
confidence interval
|
-
Any statement that an unknown parameter is between two values
with a certain probability. For example, if one says that the
95 percent confidence interval for theta is 1.1 to 10.3, this
corresponds to the probability statement that
Pr{ 1.1 <= theta <=3D 10.3 }=3D0.95.
-
Based on the observation of a certain set of data, the range of
plausible values of an unknown parameter that are consistent
with observing that data. For example, if one says the 95
percent confidence interval for theta is 1.1 to 10.3 then this
is equivalent to saying that based on the data observed, there
is a 95 percent chance that theta is between 1.1 and 10.3
|
|
confidentiality
|
In surveys, the degree to which the
respondents' identities are kept unknown to the public, other
respondents, and especially to the survey
planners, interviewers, and administrators.
|
|
control
|
A corrective action based on feedback.
|
|
control chart
|
A graphical representation of a process characteristic. A
time-sequence chart showing plotted values of a
statistic or
individual measurement, including a central line and one or more
statistically derived control limits.
Some typical examples of control charts are X--R charts,
batch averages ("individuals") control charts, within-wafer range
and standard deviation charts, wafer-to-wafer
range and standard
deviation charts, cumsum charts, exponentially weighted moving
average control charts, analysis of means control charts, and
cumulative count control charts.
|
|
control factor
|
Especially in an experiment, a factor or process input that
is easy to control, has a strong effect on the
typical value of a response, and has little
effect on the magnitude of its
variability. Usually distinguished from noise
factors.
|
|
control group
|
the set of observations in an experiment or
prospective study
that do not receive the experimental treatment(s).
These observations serve (a) as a comparison point to evaluate the
magnitude and significance of each experimental
treatment, (b) as a reality check to compare the
current observations with previous observation history, and (c) as a
source of data for establishing the natural experimental error.
|
|
control limits
|
The maximum allowable variation of a
process characteristic due to
common causes alone. Variation beyond a
control limit is evidence that special
causes may be affecting the process.
Control limits are calculated from process
data.
|
|
convenience sample
|
In a survey, the selection of observational
units according to the convenience of the investigator or the
interviewer. To be distinguished from scientific
randomization.
|
|
correlation
|
Correlation is a measure of the strength of the (usually
linear) relationship between two variables. The usual
correlation coefficient, called the Pearson correlation
coefficient, ranges from -1 to 1. A value of +1 corresponds to
the case where the two variables are related perfectly by an
increasing relationship; a value of -1 corresponds to a perfect,
but decreasing relationship. In the case of the Pearson
correlation coefficient, a value of +1 (-1) implies the
relationship is linear and increasing (decreasing).
|
|
coverage error
|
In surveys, the error that results when the
pool of potential respondents (the sample
frame) does not match the population
to which one wishes to make generalizations.
|
|
critical parameters
|
A critical parameter is a measurable characteristic of a
material, process, equipment,
measurement instrument, facility,
or product that is directly or indirectly related to the fitness
for use of a product, process, or service.
|
|
critical process module
|
A node in the process flow whose output
has a significant impact on the total process.
Sometimes also called a critical process step.
|
|
cross-validation
|
A family of methods based on the idea that the most
unbiased test of the predictive error is by
applying it to data that was not used in the building
of the initial predictive model. A common
application is to partition a dataset into two parts, to fit the
model on the first part, and to assess the
predictive capability of that model
on the second part.
|
|
cusum chart
|
A control chart based on CUmulative SUMs,
sometimes also called a "cumsum" chart. If the value measured at
time t is X(t), a cusum chart plots the value
SUM{ X(u)-target: u=1,2,...,t }. Cusum charts are
sensitive to drift, and to processes running systematically above or
below target. The are most suitable for adjustable processes,
and when the entity has only one recipe running in high volume. These
properties make it similar to an EWMA chart.
Cusum control limits take the form of a
backward-facing V-mask. A process is in
control when all plotted points lie within this V-mask.
|
|
customers
|
Organizations that use the products, information, or services of an
operation.
Return to the Top of the Page
|
|
[D]
|
data-driven
|
The property of requiring data and facts, but not requiring
subjective opinions. As opposed to
opinion-driven.
|
|
data reduction
|
The process of calculating from several numbers one or fewer numbers.
An example is that one might have 9 readings taken across a wafer. A
common data reduction would be to use the average and standard deviation,
which is only two numbers. The benefits of data reduction are usually
simplicity, interpretation ease, greater focus on issues of interest,
and small data files.
|
|
demographics
|
In surveys, information such as age,
gender, place of residence,
and annual income that can be taken to elucidate the responses,
especially by identifying the respondent as a member of a particular
group.
|
|
detection
|
The class of process corrective monitors designed to determine
whether production material is conforming to specifications. See also
disposition. As opposed to prevention.
|
|
deterministic
|
The property of being perfectly repeatable, and without experimental
or observational error. Usually achievable only in computer experiments.
|
|
diagnostic
|
A calculation or graph that serves to test one or more
assumptions about a model.
|
|
digidot chart
|
A hybrid chart that consists of a stem and leaf chart on the
y axis, with the leaves pointing left, and a time trend plot to the
right. In both cases the values plotted are the most significant
few digits of the observed value.
|
|
disposition
|
The class of product decisions that evaluate what is to be done
with production material that has been manufactured outside
specification.
See also detection. As opposed to prevention.
|
|
distribution
|
A representation of the frequency of occurrence of values of
a variable, especially of a response.
|
|
dot plot
|
A form of a histogram for which an
observation with a value within a certain range is plotted as a dot
a fixed interval above the previous dot in that same range. Useful
for small numbers of observations.
Return to the Top of the Page
|
|
[E]
|
effect
|
The change in the average or expected value of a given
response due to the change of a given
factor. The change of the given
factor is usually from the lowest to the
highest value of those tried experimentally, and
the units of the effect are usually in the same units as the
response.
|
|
efficiency
|
A fuzzy concept for the
precision that can be achieved by a given
estimation method and sample size.
An efficient method estimates a population
parameter with the shortest possible
confidence interval.
|
|
EVOP, Evolutionary Operation
|
An abbreviation for "evolutionary operation". An EVOP is a
special type of on-line experiment with several
distinguishing features:
- The experimental material is production material intended to
be delivered to customers.
- In each experimental cycle, the standard
production recipe is changed.
- The experimental factor
levels are less extreme than in
conventional off-line experiments.
- The experiment is run over a longer term
with more material than in conventional off-line
experiments.
|
|
EWMA chart
|
- Any control chart
based on Exponentially Weighted Moving Averages. An EWMA
chart plots a weighted average of the current observation and
the previously plotted point; the weight of the current
observation is denoted by lambda.
Values of lambda between 0.3 to 0.7 are generally recommended;
the value of 0.7 is better for "noisier" processes. A lambda
of 0.4 behaves approximately like one with all 8
Western Electric rules
active.
EWMA charts are sensitive to drift, and to processes running
systematically off target. It is most useful when the entity
has only one process running in high volume. These features
it shares with the cusum chart.
- At AMD, the EWMA chart also allows for the calculation of an
optimal lambda value. The AMD implementation plots the
average of the current monitor point and compares this average
to designated control limits.
In this regard, it resembles an
individuals control chart.
The EWMA part is implemented as a set of trend rules to be
used instead of the
Western Electric rules.
- As originally proposed by J Stuart Hunter, the EWMA chart
would plot the exponentially weighted average of the current
observation with the previously plotted point. These would be
compared to appropriate control
limits. In addition, the current observation would also
be plotted, and compared to appropriate (e.g.
individuals)
control limits.
Return to the Top of the Page
|
|
[F]
|
face-to-face survey
|
A method of administering surveys whereby the
respondents are interviewed by persons who are physically present.
|
|
factor
|
The input variable of a process, and
especially of an experiment.
Experimental factors are particularly those variables that are
deliberately manipulated during the experiment.
Experimental factors can be divided further into
control factors and
noise factors. Control factors are those
factors that are easy to control, and usually have a strong influence
on the response. (A classic example is the
time involved for a deposition process.) Noise
factors are factors that are either difficult
or inconvenient to control. A difficult-to-control
noise factor might be the ambient air flow
around a furnace tube. An inconvenient-to-control
noise factor might be the recent use
history of a wet clean sink.
|
|
factor level
|
In experimental design, the value that an
input variable or factor takes on.
|
|
factor range
|
In experimental design, and especially for a
quantitative factor, the difference between
the highest value that the factor takes on
and the lowest.
|
|
factorial experiment
|
An experiment in which the values of each factor are used in combination with all the values
of all other factors. A fractional
factorial experiment takes a judicious subset of all combinations,
with the following objectives in mind:
- the total number of experiments is small,
- the experimental space is well covered,
- for subsets of factors (say of size 2,
3, or 4), the total number of experimental combinations is
kept large.
|
|
focus groups
|
A method of interviewing people not individually, but in
small groups. This method is often favored as a preliminary to
formal questionnaire-based surveys. The groups
are usually composed to be comparable in some way (income, age, etc).
Its disadvantages include small sample sizes
relative to the effort expended, potential biases
from hearing other respondent views, and lack of structure for
synthesizing results.
|
|
fuzzy concepts
|
Concepts that, by their greater abstraction, admit both
generalization and alternative approaches. For example, the average
is usually calculated to estimate the typical value of a set of
numbers. The average is a specific concept, whereas "typical value"
is a fuzzy one.
Return to the Top of the Page
|
|
[G]
[H]
[I]
[J]
[K]
|
kriging
|
An interpolator easily generalized to
multiple dimensions and arbitrary configurations of observed points.
Nonetheless, kriging is analogous to least squares. A point at which
a kriging prediction is desired is thought to be more "correlated" to
the closer observed points in the observation space. Further, as
this point approaches another that is actually observed, the
correlation approaches 1.0. From these
ideas, one can formalize a prediction method, kriging. For an
experiment of n observations, kriging requires the inversion of an
n x n matrix, making it awkward to use to large n.
Return to the Top of the Page
|
|
[L]
|
lack of fit
|
A property of a model with respect to a set
of observations. Lack of fit refers to the degree to which the
model does not predict or fit the observations.
Lack of fit can be due to experimental error and
uncertainty in the
process obtaining the observations, or it may
be due to a defect in the model.
|
|
Latin hypercube design
|
An experimental design consisting of n trials, and
for which each factor
has n distinct levels. Usually the factor
levels are equally spaced. The best Latin hypercube designs are based
on orthogonal arrays.
Latin hypercube designs are especially useful for computer experiments.
|
|
Latin hypercube sampling
|
A computer experimental method that
uses Latin hypercube designs in order to estimate
distributions of the simulator outputs.
The use of Latin hypercube designs allows Latin
hypercube sampling to be quite a bit more precise than
Monte Carlo methods. The
distributions of the input
factors are represented in the spacing of the
factor levels.
|
|
LDL, lower detection limit
|
The level at which a measurement system
ceases to discriminate effectively between background
standard deviations.
|
|
Likert scale
|
In questionnaires, the answer format that requires the respondent
to pick one of a few values along a scale. 5-point and 9-points are
common Likert scales. The two ends of a Likert scale are opposites,
and the middle values represent degrees in between.
|
|
linearity
|
In metrology, the difference in bias throughout the range of the measured
instrument. This definitions is best understood if one views the
relation between measured result on the y-axis and the true value on
the x-axis. Ideal linearity is a line with slope 1.0. (Pure
bias would correspond to the intercept=0.0.)
Linearity is a little bit of a misnomer, for it refers to any
difference from a line with slope of 1.0, and this can happen both by
having a nonlinear relationship, and by having a linear relationship,
but with a slope other than 1.0.
|
|
logistic function
|
The function 1/(1+exp(-x)). The logistic function is skew-symmetric
about zero, since logistic(x)=0.5-logistic(-x). Applications include
modeling dose-response curves, heavy-tailed distributions, and as a
"squashing" function in neural network
modeling.
Return to the Top of the Page
|
|
[M]
|
mail survey
|
A method of administering surveys whereby the
respondents are contacted by mail. Salant and Dillman (1994) present
a series of strategies for improving mail survey response rates.
These include interesting cover designs, accelerating reminder
post cards and letters, and final contact by certified letter.
|
|
matching
|
In a retrospective study, a method for
identifying a comparison group. Matching pairs observational unit:
each unit that has both trait-of-interest A and nuisance
effects B,C,... with another unit that lacks
trait-of-interest A, yet still shares B,C,... Low yielding lots
(trait-of-interest is yield) are in this way
compared to well yielding lots of the same product started at about
the same time. Matches in this way are more
sensitive to key causal
differences (for example, in the particular equipment set used) than
would occur from taking "matches" from all available lots. Matching
is a way of implementing commonality studies. Matching is a kind of
blocking for retrospective studies.
|
|
MCA
|
Measurement capability assessment, or sometimes a measurement
capability analysis. A metrology
characterization. Sematech
definitions focus on (a) repeatability
and (b) reproducibility.
Broader definitions would assess (c)
sensitivity to changes in the phenomenon
being measured--such sensitivity is
desirable--and (d) sensitivity to features
other than the phenomenon being measured--such
sensitivity is not desirable.
|
|
mean time between failures (MTBF)
|
For one or a class of systems, the average time between one
failure of a system and the next failure of a system. This
average time excludes the time spent waiting for repair, the time
spent being repaired, the time spent in being requalified, and so
on; it is intended to measure only the time a system is available
and operating.
|
|
measurement error
|
- the variability observed that can be attributed to the
metrology or
measurement system. Measurement error can
be decomposed further into
miscalibration, in
sensitivity,
repeatability,
and reproducibility.
- In surveys, the error that results when
a respondent's answer is inaccurate, imprecise, or not easily
compared to those of other respondents. Salant and Dillman
(1994) divide such measurement error into errors in method,
questionnaire, interviewer, and respondent.
|
|
meta-analysis
|
A family of statistical methods that quantitatively combine the
results of separate investigations into a single statement of overall
significance.
|
|
metamodel calibration
|
The practice of determining unknown parameters of a model by the
following steps:
- Run a computer experiment by
varying the unknown parameters, and recording the expected
responses.
- Fit a model of general form, especially a
neural network, using the
responses of the
computer experiment as
inputs and the factors as the outputs.
- Extract the unknown parameters as the outputs that result
from this model when the inputs are
taken to be the empirically observed values.
|
|
method error
|
The part of the measurement error
attributable to the details of the measurement
process. In surveys,
for example, one can administer questionnaires by
face-to-face contact, by mail, by
telephone, and so on. These different
methods are recognized to give different results, and to the degree
that they do, this is an example of a method error.
|
|
metrology study
|
Sometimes called a gauge capability study,
or measurement capability assessment. Such a
study quantifies the capabilities and limitations of a
measurement instrument, often estimating its
repeatability,
reproducibility, and sometimes its
sensitivity.
|
|
mixture experiment
|
An experimental design in which each experimental
run is constrained such that when summed across the
factors, the factor
levels are constrained to sum to a constant. The typical
applications involve chemical experiments in which the
factors are liquids, or sometimes gases. In
such a case, it is the proportion of each liquid ingredient, not its
weight or volume, that is the essential issue.
|
|
model
|
A mathematical statement of the relation(s) among variables. Models
can be of two basic types, or have two basic parts: statistical
models, which predict a measured quantity; probability models, which
predict the relative frequency of different random outcomes.
|
|
monitor variable
|
A measurable characteristic of a process that
is particularly relevant and informative for purposes of
process control. To be distinguished from a
critical parameter, which is more
relevant for product acceptance.
|
|
Monte Carlo sampling
|
A computer experimental method that
uses random numbers in order to
estimate distributions of simulator
outputs.
Return to the Top of the Page
|
|
[N]
|
neural nets
|
See neural network models.
|
|
neural network models
|
A highly flexible modeling method that
postulates one or more layers of unobserved variables. Each
unobserved variable is a linear function of variables of the previous
layer (and the first layer are the factors,
or model inputs). As output to the next layer, the output of each
unobserved variable is nearly always
transformed by a nonlinear function,
most commonly the logistic function. Neural
networks are sometimes use for analysis of
computer experiments,
especially when the size of the experiment makes
kriging impractical.
|
|
noise factor
|
Especially in an experiment, a
factor or process input that can
be either difficult or inconvenient to control. Noise factors also
include product use conditions (the temperature, test conditions,
environment). Usually distinguished from control factors.
|
|
noise-to-signal ratio
|
The ratio of the measurement system's
precision to the average measurement value;
the reciprocal of the signal-to-noise ratio. The noise-to-signal ratio
allows one to express the magnitude of measurement precision on a
percentage scale.
|
|
nonresponse
|
The event that occurs during an experiment,
survey, or observational
study in which the responses (results of
interest) cannot be measured or
completely recorded.
|
|
nonresponse error
|
In surveys, the error that results when
sampled respondents decline to answer the questionnaire, especially
when these respondents, viewed as a whole, seems to be different
those who do answer in a way that is important to the study.
|
|
normal distribution
|
A symmetric distribution with one high
point or mode, sometimes also called the bell curve. The average is
one of many statistical calculations that,
even for only a moderate amount of data, tend to have a
distribution of that resemble the
normal curve. In industry, there are four important properties of the
normal distribution:
- it is symmetric,
- within plus and minus one standard
deviation about 68 percent of the distribution is enclosed,
- within plus and minus two standard deviations, 95 percent, and
- within plus and minus three standard deviations, 99.7 percent.
Return to the Top of the Page
|
|
[O]
|
objective methods
|
Methods of data collection, and especially of data analysis,
characterized by the fact that they do not depend on the opinions or
knowledge particular to an individual. Objective methods are
reproducible, in a scientific sense, and in principle amenable to
reduction to software algorithms.
|
|
off-line SPC
|
Techniques such as histograms, checklists, Pareto
charts, capability indices, and designed experiments that are intended to
characterize selected properties of a process
without necessarily determining when to invoke a control algorithm to
investigate or correct for special causes.
|
|
on-line SPC
|
Techniques such as control charts that
seek to monitor a process relative to its
natural variation and seek to identify
when the invocation of a control algorithm, either to investigate
or correct for special causes, is
warranted. Certain statistical techniques, such as
EVOP, seek the dynamic optimization of a
process; these are also on-line SPC techniques.
|
|
open-ended question
|
In a survey, a question format that poses a
question, but does not attempt to structure the answer (by yes-no,
scale from 1 to 10, etc). Rather, the respondent is expected to
reply in his or her own words, orally or in writing.
|
|
opinion-driven
|
The property of depending on personal opinion, arbitrary fudge
factors, or other choices not objectively
grounded. As opposed to data-driven.
|
|
optimal design
|
The approach to creating experimental designs
using a computer algorithm maximizing an objective funtion. The most
common objective function is the determinant of the coefficients'
variance-covariance matrix; such designs are called D-optimum. In
contrast to the optimal design approach is that based on
orthogonal arrays.
|
|
optimum
|
- Especially as determined by an experiment,
the combination of factor setpoints that
achieve the best balance of the
responses of interest.
- The average response values achieved
at such a set of factor setpoints.
|
|
orthogonal array
|
A table consisting of rows and columns with the property that
for any pair of columns (factors) all
combinations of values (levels) occur, and
further, all combinations occur the same number of times.
|
|
outliers
|
Observations whose value is so extreme that they appear not
to be consistent with the rest of the dataset. In a process monitor,
outliers indicate that assignable or special causes are present.
The deletion of a particular outlier from a data analysis is easiest
to justify when such an usual cause has been identified.
|
|
out of control
|
A process is out of control when a
statistic such as an average or a range
exceeds control limits or when,
although within the control limits, a
significant trend or pattern in this
statistic emerges. Being out of control
defines a time-bounded state, not an intrinsic property of a
process. By analogy, at any given time, a
driver may be involved in an accident (out of control) or not. The
intrinsic property of the process is whether
the driver is a safe driver or not (whether the frequency of
out-of-control conditions is excessive or not). To determine the
latter, the intrinsic safety (stability),
typically requires observation over a sustained period of time.
Return to the Top of the Page
|
|
[P]
|
Pareto analysis
|
A technique for problem solving in which all potential
problem areas or sources of variation are
ranked according to their contribution.
|
|
partially open question
|
In a survey, a question format that poses
a question, structures the answer somewhat, but also admits the
respondent to reply verbally. A hybrid of
close-ended and
open-ended questions.
|
|
PDC
|
Passive data collection, sometimes called a
prospective observational study. An early
phase of engineering characterization
in which a process is repeated and measured,
but in which interventions--adjustments, modifications, recipe
changes--are avoided. Associated with Sematech qualification plan.
|
|
Poisson distribution
|
An important theoretical distribution
used to model discrete events, especially the
count of defects in an area. The Poisson distribution depends on
one parameter, lambda, which represents the average defect
density per observation area (or volume, time interval, etc.). The
Poisson distribution assumes that the counts of defects in two
non-overlapping observation units are independent. Further, the
Poisson distribution assumes the distribution of defect counts depend
only on the area in which they are to be observed. Unlike the
binomial distribution, the Poisson
distribution in principle sets no limit to the number of defects that
can be observed in any area. Of particular interest the
semi-conductor industry, the Poisson probability of observing zero
defects in a region of area A, exp{-lambda A}, is
useful for yield modeling.
|
|
population
|
The entire set of potential observations (wafers, people, etc)
about whose properties we would like to learn. As opposed to
sample.
|
|
precision
|
- in metrology, the variability of a
measurement process
around its average value. Precision is usually distinguished
from accuracy, the variability of a
measurement process around the true
value. Precision, in turn, can be decomposed further into
short term variation or
repeatability, and long term
variation, or
reproducibility.
- A fuzzy concept term for the
general notion that one knows more or has shorter confidence
intervals if one has more data; that is, more data gives
greater precision in answers and decisions.
|
|
prevention
|
The class of process monitors and corrective actions taken before
production material is placed in jeopardy.
|
|
probability plot
|
A plot designed to assess whether an observed distribution has a shape
consistent with a theoretical distribution, especially with the
normal distribution.
The values observed are plotted against the expected order statistics
from the theoretical distribution. When a straight line is apparent,
the observed and theoretical distributions are said to have the same
shape. Probability plots are especially good when the observed
distribution consists of many observations, and useful for comparing
at most only a few groups.
|
|
process
|
A combination of people, procedures, machinery, material,
measurement equipment, and environmental
conditions for specific work activities. A repeatable sequence of
activities with measurable inputs and outputs.
|
|
process signature
|
The characterization of a
process, including its
sensitivity to input variables, its
magnitude of natural variation, its
sensitivity to
variation in incoming material, and its
dynamic and output profiles, both when operating naturally and when
behaving aberrantly.
|
|
process capability study
|
A study that quantifies the common cause
variability of a process. See also
capability study.
|
|
proctor survey
|
A method of administering a survey in
which the respondents are placed into a room that is attended by a
person, a proctor. Proctor surveys preserve the
confidentiality of the respondents
answers, yet provide sufficient administrative structure so that
one can ensure high response rates.
|
|
prospective study
|
A kind of nonexperimental study in which sample
selection and all investigated phenomena occur after the onset of the
study. See also PDC.
|
|
proxy
|
In a characterization, a variable
that is used to replace another either because, in the case of a
response, it is easier to
measure, or because, in the case of a
factor, it is easier to manipulate.
|
|
P/T ratio
|
In metrology when applied to a manufacturing
situation, the "precision-tolerance" ratio. The
precision element is usually the 3
standard deviation magnitude of
measurement error
(precision,
reproducibility), and the tolerance
element is usually the corresponding half-tolerance:
USL-(USL+LSL)/2, where USL (LSL) denote the upper (lower)
specification limits, respectively. A common goal for a new
metrology or process development project is
to achieve a P/T ratio of 0.1.
Return to the Top of the Page
|
|
[Q]
[R]
|
randomization, scientific
|
- The assignment of experimental material to treatments and
treatment order through the use of random number tables.
- The selection of observational units through the use of random
number tables. Scientific randomization is to be
distinguished from arbitrary assignments and selection, and
from systematic assignments (e.g. wafers 1-12 receive
treatment A, 13-24 treatment B).
|
|
range
|
For a given set of observations, the difference between the highest
and lowest values.
|
|
rational subgroups
|
Multiple readings taken to monitor a process,
including the magnitude of short term
variation. Rational subgroups of size 2 to
6 are the most common. Well constituted rational subgroups are the
basis of SPC's most sensitive Shewhart
charts, the X-bar-R (X-bar-S) chart.
|
|
R chart
|
A control chart that plots
ranges. Like S charts,
R charts are typically used to monitor
process uniformity, and measurement precision.
Constant sample sizes for the
rational subgroups are strongly
recommended. There is a special set of
Western Electric rules for
R charts when the rational subgroup
size is two. When the rational
subgroup size is greater than 9, S charts
are preferred to R charts for reasons of efficiency.
|
|
reference group
|
A group of observations, or a group that could be observed, that
serves as a point of comparison in a study. A reference group has a
function similar to that of a control
group, but, unlike a control group, does not carry the
connotation that it was constructed deliberately
randomly.
|
|
repeatability
|
In metrology, the component of
measurement precision that is the
variability in the short term, and that occurs under highly controlled
situations (e.g. same metrology instrument, same operator, same setup,
same ambient environment, etc.).
|
|
reproducibility
|
In metrology, the total measurement precision, especially including the components
of variability that occur in the long term, and occurring from one
measurement instrument to another, one laboratory to another, etc.
|
|
residual
|
The difference between the actual value observed and the prediction
or fitted value derived from a model. Residuals
give information both about the model's
lack of fit, and also about experimental
error of the measurement
process.
|
|
resolution
|
- In experimental design, especially for
two-level designs, the length
of the word of the shortest confounding relationship.
Geometrically, design resolution corresponds to the 1 plus
the strength.
- In metrology, the number of
significant digits of a measurement system
that can be meaningfully interpreted.
|
|
respondent error
|
In surveys, a component of
measurement error that results from
the respondent deliberately or inadvertently answering incorrectly.
|
|
response
|
The measured output of a process or
experiment. Responses usually depend on the
choice of metrology tool. In planning experiments,
several responses are usually of interest, and their selection is
tied closely to overall purpose of the study.
|
|
response surface model (RSM)
|
A polynomial model of several
factors, especially one including terms for
linear, quadratic, and second-order crossproducts.
|
|
retrospective study
|
A kind of nonexperimental study in which all the phenomenon
investigated occurs prior to the onset of the study. Further, the
samples of retrospective studies are usually
chosen by the value the responses take.
This latter point creates special conceptual issues regarding
causality, and the composition of comparison
samples (see matches)
is especially important. Advantages of retrospective
samples is that they allow one to investigate
phenomena that are either unlikely or undesirable to occur in the
future; further, since all key events occur in the past,
retrospective studies can often be undertaken economically.
|
|
robust methods
|
Methods of data analysis that are robust are not strongly affected by
extreme changes to small portions of the data; their answers do not
change very much from the presence of
outliers. A classic example of a robust
method is the median.
|
|
rotatable
|
The property of an experimental design
that minimizes the correlation
among the terms of a full quadratic model,
(including interactions), thereby allowing
one to select some terms without regard to
the significance of other terms. A generalization of
orthogonality to
response surface designs.
|
|
R-squared
|
A statistic for a predictive
model's lack of fit
using the data from which the model was derived.
- R-squared is calculated as 1 minus the following ratio:
SUM[ squared residuals from model ]/ SUM[ squared deviations from mean ]
A perfectly fitting model yields an
R-squared of 1.
- The latter definition is flawed by giving more credit to
complicated models than is appropriate.
To achieve an average value of zero when the
model has no merit, R-squared-adjusted is often proposed.
Return to the Top of the Page
|
|
[S]
|
sample
|
- The set of observational units (wafers, people, etc) whose
properties our study is to observe. When we select a sample
by scientific randomization, we
are more easily able to generalize our conclusions to the
population of interest. As opposed
to population.
- For a given characteristic, the collection of measurements
that are actually observed.
|
|
sample size
|
The number of observations in, or planned to be in, a study or
other investigation. Key considerations in selecting a particular
sample size are
- value associated with any particular
level of
precision,
- the costs of obtaining observations, and
- available resources.
Some generic advice on sample sizes is
- 16, to estimate the center of a
distribution by its average,
- 20, to estimate the correlation
between two measurements,
- 32 per group, to estimate average difference between two
groups,
- 50, to estimate the standard deviation
of a distribution.
|
|
sample frame
|
In sampling theory, the set of all units from which a
sample is drawn. The sample frame is
synonymous with the statistical population,
but has a more technical and precise connotation relating to a
particular enumeration of the population
elements.
|
|
sampling distribution
|
The distribution of a summary quantity or
statistic.
|
|
sampling error
|
In surveys, the error that results when
the selection of respondents (the sample) is
biased in a way so that the
population about which one wishes to make
conclusions is not accurately represented.
|
|
scatter plot
|
A graph of a pair of variables that plots the first variable
along the x-axis and the second variable along the y-axis. In a
scatterplot, the points of successive pairs are not connected.
|
|
scatterplot matrix
|
A graph of several variables that plots all pairs of variables
in a corresponding scatterplot. In turn,
these scatterplots are
arranged in the form of an upper triangular matrix. In any row of
this matrix, the y axes of all plots are always the same variable;
in any column, the x axes also the same variable.
|
|
S chart
|
A control chart that plots
standard deviations. Like R
charts, S charts are typically used to
monitor process uniformity, and
measurement precision. Constant
sample sizes for the
rational subgroups are strongly
recommended. There is a special set of
Western Electric rules for
S charts when the rational subgroup
size is two. S charts are preferred to R
charts for reasons of efficiency regardless
of rational subgroup size, but this
becomes especially important for sizes greater than 9,
|
|
sensitive methods
|
Methods of data analysis that are able to detect the presence of
phenomena in the presence of noise, or in spite of small
samples. Sensitive methods make the most
efficient use of available data, and are especially useful when
analyzing small datasets, such as from
experiments. A classic example of a sensitive
method is the average. When the underlying data comes from a single
normal or Gaussian
distribution, the average is the most
sensitive method for estimating the
distribution's center.
|
|
sensitivity
|
In metrology, the rate at which the
average measurement changes to changes in the true value. Often
reported in units of percentage change to unit percentage change.
The term is also used in the interpretation of
response surface models.
|
|
sensitivity study
|
An investigation of a process that identifies
how strongly the input parameters affect one or more desired output
characteristics.
|
|
sequential studies
|
A style of investigation, especially in
experiments, whereby a study is broken into a
series of distinct phases, and the results of each phase are allowed
to influence subsequent phases.
|
|
special cause
|
A source of variation that is large,
intermittent or unpredictable, affecting only some of the individual
values of the process output being studied.
Also called an assignable cause.
|
|
specification limits
|
The numerical values defining the interval of acceptability
for a particular characteristic.
|
|
split
|
A group of experimental units that is processed in identical
fashion. For example, a 2x2 factorial
experiment would have four splits. When applied to a lot of 24
wafers, 6 wafers would be assigned to each "split."
|
|
stability
|
The degree to which observations of a process
can be represented by a single random "white noise"
distribution, in which the prediction
of the next value is not improved by knowing the
process history.
|
|
stable process
|
A process that is in a state of
statistical control.
|
|
standard deviation
|
A measure of spread or dispersion of a
distribution. It estimates
the square root of the average squared deviation from the
distribution average, sometimes called
the root-mean-square. Among all measures of dispersion, the standard
deviation is the most efficient for normally
distributed data. Also, unlike the range, it
converges to a single value as more data from the
distribution is gathered.
|
|
standard error
|
The standard deviation for a
statistic's
sampling distribution. Because
many have sampling distributions
that are approximately normal, plus and minus 2
standard errors is usually an approximate 95 percent
confidence interval.
|
|
statistic
|
A value calculated from sample data.
|
|
statistical control
|
The state of a process that is influenced
by common causes alone. See
in control.
|
|
statistical design of experiments (SDE)
|
Also called design of experiments (DoE, DoX).
- The theory of experimental design emphasizing factorial and
fractional factorial designs, response
surface modeling, and analysis of variance methods.
- A particular experiment based on this theory.
- The scientific principles, experimental design strategies,
and model building and evaluation
techniques that lead to the efficient and thorough
characterization and/or
optimization of products and processes.
|
|
statistical process control (SPC)
|
The conversion of data to information using statistical
techniques to document, correct, and improve
process performance.
|
|
SPC tools
|
A didactic list from Kaoru Ishikawa of the following easy-to-use
tools:
- Pareto charts and
bar charts
- cause-effect and process flow
diagrams,
- stratification,
- checklist,
stem-and-leaf plots, and
digidot charts
- histograms and
dot plot,
- scatterplots, and
scatterplot matrices,
- trend charts and
control charts.
Ishikawa has introduced slight variations to this list over time.
|
|
statistical quality control (SQC)
|
| |