Norms, Correlation, and Inference

Chapter 4


•“In a psychometric context, norms are the test performance data of a particular group of testtakers that are designed for use as a reference for evaluating or interpreting individual test scores” (Cohen & Swerdlik, 2002, p. 100).

–Defined broadly or narrowly.

–The group is typical on the basis of a particular characteristic.

–A distribution of scores is yielded to which other scores can be compared.


•Data can be in the form of raw scores or converted scores.

–A raw score is the unmodified accounting of performance that is usually numerical.

–Scores can be converted in order to facilitate their comparison to specific norms.


•The process of deriving norms is referred to as norming.

–Norms can be derived on the basis of race, which is a controversial process referred to as race norming.

–Cut scores, or a set reference point used to divide data into two or more classifications, might also be created.

•The normative sample refers to the group of people whose performance data are used as a reference for evaluating individual test scores.

–Standardization sample is sometimes used interchangeably, but sometimes this is not appropriate.

Standardization and Norming

•When tests must be administered and scored in a clearly specified manner, they are considered to be standardized.

•Before norms can be determined, the test developer must specify for whom the test is designed.

–The developer is interested in the population; or the complete set of individuals with at least one common, observable characteristic; of potential testtakers.

–The developer will typically develop norms based on a sample, or group from the complete set of individuals that is deemed to be representative of the population.

Standardization and Norming

•Sampling procedures may differ

–Stratified sampling takes into consideration the proportions of varying subgroups that comprise the population.

–Stratified random sampling– sampling is random in nature, or everyone has the same chance of being selected.

–Purposive sampling involves the arbitrary selection of a sample because it is believed to represent the population for some reason.

–Incidental sampling or convenience sampling refers to the selection of a sample due to its availability.

Types of Norms


–Refers to a distribution divided into 100 equal parts.

–Percentiles refer to the score at or below which a specific percentage of scores fall.

–Distortion may occur near the ends of  (minimized) or the middle of (exaggerated) the distribution.

–Problems exacerbated in skewed distributions

Types of Norms

•Age Norms (age-equivalent scores)

–“indicate the average performance of different samples of testtakers who were at various ages at the time the test was administered” (Cohen & Swerdlik, 2002, p. 105).

–Problems with the comparison of “mental age”.



Types of Norms

•Grade Norms

–Used to indicate the average test performance of testtakers in a specific grade.

–Based on a ten month scale, refers to grade and month (e.g., 7.3 is equivalent to seventh grade, third month).

–Only useful when considering the years and months of school completed.

•Consider students who move frequently or who are home schooled.

•Consider adults and children not who are not yet in school.

Types of Norms

•National Norms

–Derived from a standardization sample nationally representative of the population of interest.

•National Anchor Norms

–Used to consider two tests that were normed by using the same sample (i.e., each member of the sample took both tests).

•Equipercentile method

•The equivalencies are not precise equalities.

Types of Norms

•Subgroup Norms

–Are created when narrowly defined groups are sampled.

•Socioeconomic status


•Education level


•Local Norms

–Are derived from the local population’s performance on a measure.

•Typically created locally (i.e., by guidance counselor, personnel director, etc.)

Fixed Reference Group Scoring Systems

•Calculation of test scores is based on a fixed reference group that was tested in the past.


•Fixed reference group tested in 1990 (over a million testtakers) was put into use in 1995.

•Raw scores are converted through the use of anchoring.


Norm Referenced v. Criterion Referenced

•Norm referenced tests consider the individual’s score relative to the scores of testtakers in the normative sample.

•Criterion Referenced tests consider the individual’s score relative to a specified standard or criterion (cut score).

–Licensure exams

–Proficiency tests


Correlation and Regression

•To determine the strength of the relationship or association between two variables, the coefficient of determination is utilized as an index.

–Ranges in value from -1 to +1

–Magnitude of the correlation is determined by the absolute value of the number, with a perfect correlation equal to 1.

–The direction of the correlation is determined by the sign (positive– the two variables move in the same direction; negative– the two variables move in opposite directions).

–Relationship must be linear.

•The correlation is not an index of causation.


•Pearson r

–Coefficient of determination = r²

–Error = 100(1- r²)

•The Spearman Rho

–Rank-order correlation coefficient

–Used with ordinal data



•Utilized to evaluate linearity and outliers.

–Outliers are located in the extremes of the scatterplot and are atypical.

–Restriction of Range

•Ceiling effect

•Floor effect


•Simple Regression

–Statistically equivalent to a bivariate correlation; however an independent variable (predictor) is assigned as well as a dependent variable (outcome).

–Results in an equation for the line of best fit or regression line.

•Y = bX + a (where Y = predicted Y, b = slope, x = measured score on X, and a = Y intercept or the point where the line crosses the Y axis)



•Standard error of the estimate

–The error in the prediction of Y from X.

•Multiple regression

–Is utilized when more than one variable is used to predict Y.

–The correlation among predictor scores is taken into consideration and each is given a weight (referred to as b values) in the equation.


Inference From Measurement


–Involves the combination of statistical information across studies to produce a single estimate of the statistic under study.

•Culture and Inference

–Statistics must be utilized in an informed manner and context is always important.