All For candidates Hiring & recruiting Human resources Science Skills-based hiring Talent assessment TestGorilla

A brief introduction to classical test theory

Written by Gabrielle Blackman

A brief history of classical test theory

Science series materials are brought to you by TestGorilla’s team of assessment experts: A group of IO psychology, data science, psychometricians, and IP development specialists with a deep understanding of the science behind skills-based hiring.

If you've ever sat for an exam, at school, during a job application, or for a professional certification, you've encountered the principles of test design firsthand. Although taking tests might be a routine experience, the underlying framework that ensures they work as intended remains a mystery to many. In this blog, we’ll explore the foundations of classical test theory and explain how this framework continues to inform test development processes today.

What is classical test theory?
At TestGorilla, we apply classical test theory in innovative ways.
Classical test theory does have its limits.
Beyond classical test theory: TestGorilla's multi-faceted approach to test development

What is classical test theory?

Classical test theory refers to a scientific framework that explains what test scores comprise and how to increase their accuracy.

In employment testing, classical test theory helps us to understand how to create, use, and interpret measures of people’s psychological traits, skills, and other characteristics. People who develop and improve tests, such as psychometricians, often apply classical test theory principles and their related processes and best practices to create and use them effectively. Classical test theory underlies several key concepts, such as test validity, test reliability, and error, which we cover in other articles.

Classical test theory may sound complicated, but its framework is fairly straightforward when we break it down into components. One of its underlying principles is the notion that each test score reflects both a person’s true standing on what the test measures and error.

The underlying principle of classical test theory

True score: The person’s true standing on what is being measured is what we hope to capture when administering tests. For example, suppose Sal is an expert at using Microsoft Excel. Sal’s true score would reflect a high level of understanding on an Excel test. If a test had a range of scores from 0 - 100%, we might expect Sal’s true score to be in the upper range of those scores, such as 95%. However, when assessing skills, the true score is often seen as a theoretical score since we often cannot verify a person’s true score with absolute certainty.
Observed score: The observed score refers to what the test-taker actually receives on the test. For example, if we give Sal a Microsoft Excel skills test, perhaps the observed test score is 85% (lower than the true score).
Error: Error refers to the part of the observed score that is not a reflection of one’s true score and could come from numerous sources. Returning to our example, maybe Sal had trouble sleeping the night before the test, leading to mistakes in answering the questions. Even though Sal typically would have answered 95% of the questions correctly, the lack of sleep leads to mistakes 10% of the time. The observed or final test score is now 85% instead of 95% due to the error. As error increases, the amount of the score that reflects the person’s actual standing decreases.

In practice, test developers apply best practices to reduce the degree of error entering test scores. Now, there’s not much a test developer can do to ensure that test takers like Sal get a good night's sleep before the big test day. However, there are plenty of other ways they can minimize the chances of error entering observed test scores. Let’s see how TestGorilla is innovatively reducing error by applying classical test theory.

At TestGorilla, we apply classical test theory in innovative ways.

By employing the following and other best practices, TestGorilla significantly reduces the error in our test scores, leading to more accurate assessments of candidates’ skills.

Standardization using technology. One technique that minimizes errors is standardizing the test administration process. This means giving all test-takers the same instructions, the same amount of time, and the same testing conditions, which helps to eliminate environmental variables that could lead to error. Our online test platform enables us to provide a standardized experience for all test takers, reducing errors that may arise due to different administration approaches.
Increasing validity. Classical test theory is embedded in our test development processes. We start by working with a global network of subject-matter experts (SMEs) to establish our test structures and questions. Our internal Assessment Development team members apply test development best practices to refine our tests further before another 3rd party SME critiques them. This iterative process leads to higher-quality, more relevant, and valid tests that increase the likelihood that candidates’ observed scores reflect their true scores.
Updating. Continually reviewing and updating test materials is essential. Over time, content can become outdated, making certain test items less relevant. Regularly updating the test ensures that it continues to measure the skills or knowledge intended to be measured accurately. At TestGorilla, we have a comprehensive test review process to ensure our tests are up-to-date.
Reducing bias, a major source of error. Our psychometricians use advanced techniques, monitoring group differences, to assess whether our tests work equally well across candidates with different personal characteristics. By analyzing item response patterns across diverse groups, our psychometricians can detect and correct potential biases. These processes help us to reduce error and increase the likelihood that candidates’ observed scores equal their true scores.
Preparing test scorers. Providing clear and thorough training for test scorers can also reduce scoring errors. Ensuring those who score the test understand how to do so consistently and accurately is essential in maintaining the test's reliability. At TestGorilla, we prepare our clients with guidelines for how to analyze test results and interpret scores.

Classical test theory does have its limits.

While classical test theory underlies much of modern testing, it does have its limitations. It is based on some assumptions that are unlikely to be true. For example, it is based on the assumption that sources of error impact test-takers similarly. However, that likely is not the case.

Think of taking a test like trying to hit a bullseye on a dartboard. In this analogy, classical test theory assumes that every time you miss the bullseye, you miss it by the same amount, no matter who you are or how good you are at darts. But in real life, sometimes you might miss by a lot, and other times you might almost hit the center. One day, you might be close to the bullseye because you're feeling great, but another day, you might miss by a lot because you're tired or distracted. So, just like with darts, how much you 'miss' on a test (the 'error') can be different each time and for each person. Similarly, external sources of errors (e.g., measurement error) likely affect test-takers and their scores differently. Since classical test theory has limitations due to the assumptions it rests upon, psychometricians often consult additional theoretical frameworks. For example, at TestGorilla, our psychometricians also draw from other test theories, such as the item response theory (which we’ll cover in a later article!).

Beyond classical test theory: TestGorilla's multi-faceted approach to test development

TestGorilla harnesses classical test theory to enhance the validity of their tests, reduce biases, and ensure fairness across diverse groups of candidates. By standardizing the administration process, rigorously analyzing test items, and continuously updating test content, the team strives to minimize errors that could skew the observed scores.

While classical test theory provides a solid foundation, it’s not without its limitations, error impacts test-takers differently and external factors can influence test outcomes. Recognizing this, TestGorilla also incorporates insights from other theoretical frameworks, like item response theory, to bolster their approach.

If you want to learn more about the science behind TestGorilla’s tests, we recommend reading:

References

Novick, M.R. (1966). The axioms and principal results of classical test theory. Journal of Mathematical Psychology, 3(1), 1-18.