Where Standardized Tests Fail: Subjectivity and Imprecision Tend to Prevail
Exams are better at gauging the big picture than at evaluating students.
March 18, 2008
Credit: Gregory Cherin
Today's standardized assessments can be useful for spotting big trends or gauging the effectiveness of state programs overall. However, when used in high-stakes accountability, as the sole indicator of an individual student's achievement or the quality of a single school or school district, these tests can be imprecise. Creating and scoring such tests is complex. Here are some of the steps in the testing process where subjectivity prevails and inaccuracies arise:
- Content selection: If the state sets too many standards, teachers won't be able to cover them all and will have to guess which are on the test. If test makers include too few questions on any given skill, the results may not truly show how well a student can perform it.
- Ambiguous questions: Particularly for multiple-choice questions, a child may be able to make a plausible, even creative, argument for choosing one of the "incorrect" answers, but the format doesn't allow the child to explain.
- Setting the difficulty level: This determination, typically based on educators' and officials' opinions, is naturally subjective. To select final questions, test makers often try them out on students, which works only insofar as the trial-run group accurately represents the students who will ultimately take the test.
- Year-to-year comparison: To prevent cheating, states typically ask test makers to create new questions every year. Test makers must then perform the tricky business of trying to ensure that the exams are equally difficult so that scores can be compared like apples to apples.
- Test preparation: The teaching of test-taking strategies may favor some students and keep their scores from reflecting what they actually know.
- Distractions: Whether internal or external, distractions such as test anxiety, personal problems, lack of sleep, a sick classmate, or a broken air conditioner can distort students' scores.
- Mechanical or human error: Mistakes may occur in setting the answer key, feeding answer sheets into scoring machines, marking answers right or wrong, or other steps in the process.
- Cut scores: These cutoff points for passing and advanced scores are based partly on educators' and officials' judgment, so they're subjective. Also, given the natural imprecision of scores explained in this chart, a student's score may fall below the cutoff point for failing even if she is knowledgeable enough to pass -- and vice versa.