George Lucas Educational Foundation

Standardized Testing Fails the Exam

If enough educators -- and noneducators -- realize there are serious flaws in how we evaluate our schools, maybe we can stop this absurdity.
By W. James Popham
  • Facebook
  • Twitter
  • Pinterest
  • Share
Credit: Veer/James Godman

For the last four decades, students' scores on standardized tests have increasingly been regarded as the most meaningful evidence for evaluating U.S. schools. Most Americans, indeed, believe that students' standardized test performances are the only legitimate indicator of a school's instructional effectiveness.

Yet, although test-based evaluations of schools seem to occur almost as often as fire drills, in most instances these evaluations are inaccurate. That's because the standardized tests employed are flat-out wrong.

Standardized tests have been used to evaluate America's schools since 1965, when the U.S. Elementary and Secondary Education Act became law. That statute provided for the first major infusion of federal funds into local schools and required educators to produce test-based evidence that ESEA dollars were well spent.

But how, you might ask, could a practice that's been so prevalent for so long be mistaken? Just think back to the many years we forced airline attendants and nonsmoking passengers to suck in secondhand toxins because smoking on airliners was prohibited only during takeoff and landing.

Some screw-ups can linger for a long time. But mistakes, even ones we've lived with for decades, can often be corrected once they've been identified, and that's what we must do to halt today's wrongheaded school evaluations. If enough educators -- and noneducators -- realize that there are serious flaws in the way we evaluate our schools, and that those flaws erode educational quality, there's a chance we can stop this absurdity.

Instructionally Insensitive

First, some definitions:

A standardized test is any test that's administered, scored, and interpreted in a standard, predetermined manner. Standardized aptitude tests are designed to make predictions about how a test taker will perform in a subsequent setting. For example, the SAT and the ACT are used to predict the grades that high school students will earn when they get to college. By contrast, standardized achievement tests indicate how well a test taker has acquired knowledge and mastered certain skills.

Although students' scores on standardized aptitude tests are sometimes unwisely stirred into the school-evaluation stew, scores on standardized achievement tests are typically the ones used to judge a school's success. Two kinds of standardized achievement tests commonly used for school evaluations are ill suited for that measurement.

The first of these categories are nationally standardized achievement tests like the Iowa Tests of Basic Skills, which employ a comparative measurement strategy. The fundamental purpose of all such tests is to compare a student's score with the scores earned by a previous group of test takers (known as the norm group). It can then be determined if Johnny scored at the 95th percentile on a given test (attaboy!) or at the tenth percentile (son, we have a problem).

Because of the need for nationally standardized achievement tests to provide fine-grained, percentile-by-percentile comparisons, it is imperative that these tests produce a considerable degree of score spread -- in other words, plenty of differences among test takers' scores. So, producing score spread often preoccupies those who construct standardized achievement tests.

Statistically, a question that creates the most score spread on standardized achievement tests is one that only about half the students answer correctly. Over the years, developers of standardized achievement tests have learned that if they can link students' success on a question to students' socioeconomic status (SES), then about half of the test takers usually answer that item correctly. If an item is answered correctly more often by students at the upper end of the socioeconomic scale than by lower-SES kids, that question will provide plenty of score spread.

After all, SES is a delightfully spread-out variable and one that isn't quickly altered. As a result, in today's nationally standardized achievement tests, there are many SES-linked items.

Unfortunately, this kind of test tends to measure not what students have been taught in school but what they bring to school. That's the reason there's such a strong relationship between a school's standardized-test scores and the economic and social makeup of that school's student body.

As a consequence, most nationally standardized achievement tests end up being instructionally insensitive. That is, they're unable to detect improved instruction in a school even when it has definitely taken place. Because of this insensitivity, when students' scores on such tests are used to evaluate a school's instructional performance, that evaluation usually misses the mark.

A second kind of instructionally insensitive test is the sort of standardized achievement test that many states have developed for accountability during the past two decades. Such tests have typically been created to better assess students' mastery of the officially approved skills and knowledge. Those skills and knowledge, sometimes referred to as goals or curricular aims, are usually known these days as content standards. Thus, such state-developed standardized assessments -- like the Florida Comprehensive Assessment Test (FCAT) -- are frequently described as standards-based tests.

Because these customized standards-based tests were designed (almost always with the assistance of an external test-development contractor) to be aligned with a state's curricular aspirations, it would seem that they would be ideal for appraising a school's quality. Unfortunately, that's not the way it works out.

When a state's education officials decide to identify the skills and knowledge students should master, the typical procedure for doing so hinges on the recommendations of subject-matter specialists from that state. For example, if authorities in Ohio or New Mexico want to identify their state's official content standards for mathematics, then a group of, say, 30 math teachers, math-curriculum consultants, and university math professors are invited to form a statewide content-standards committee.

Typically, when these committees attempt to identify the skills and knowledge students should master, their recommendation -- not surprisingly -- is that they should master everything. These committees seem bent on identifying skills they fervently wish students would possess. Regrettably, the resultant litanies of committee-chosen content standards tend to resemble curricular wish lists rather than realistic targets.

Whether or not the targets make sense, there tend to be a lot of them, and the effect is counterproductive. A state's standards-based tests are intended to evaluate schools based on students' test performances, but teachers soon become overwhelmed by too many targets. Educators must guess about which of this multitude of content standards will actually be assessed on a given year's test. Moreover, because there are so many content standards to be assessed and only limited testing time, it is impossible to report any meaningful results about which content standards have and haven't been mastered.

After working with standards-based tests aimed at so many targets, teachers understandably may devote less and less attention to those tests. As a consequence, students' performances on this type of instructionally insensitive test often become dependent on the very same SES factors that compromise the utility of nationally standardized achievement tests when used for school evaluation.

Wrong Tests, Wrong Consequences

Bad things happen when schools are evaluated using either of these two types of instructionally insensitive tests. This is particularly true when the importance of a school evaluation is substantial, as it is now. All of the nation's public schools are evaluated annually under the provisions of the federal No Child Left Behind Act.

Not only are the results of the NCLB school-by-school evaluations widely disseminated, there are also penalties for schools that receive NCLB funds yet fail to make sufficient test-based progress. These schools are placed on an improvement track that can soon "improve" them into nonexistence. Educators in America's public schools obviously are under tremendous pressure to improve their students' scores on whatever NCLB tests their state has chosen.

With few exceptions, however, the assessments states have chosen to implement because of NCLB are either nationally standardized achievement tests or state-developed standards-based tests -- both of which are flawed. Here, then, are three adverse classroom consequences seen in states where instructionally insensitive NCLB tests are used:

Curricular Reductionism

In an effort to boost their students' NCLB test scores, many teachers jettison curricular content that -- albeit important -- is not apt to be covered on an upcoming test. As a result, students end up educationally shortchanged.

Excessive Drilling

Because it is essentially impossible to raise students' scores on instructionally insensitive tests, many teachers -- in desperation -- require seemingly endless practice with items similar to those on an approaching accountability test. This dreary drilling often stamps out any genuine joy students might (and should) experience while they learn.

Modeled Dishonesty

Some teachers, frustrated by being asked to raise scores on tests deliberately designed to preclude such score raising, may be tempted to adopt unethical practices during the administration or scoring of accountability tests. Students learn that whenever the stakes are high enough, the teacher thinks it's OK to cheat. This is a lesson that should never be taught.

These three negative consequences of using instructionally insensitive standardized tests as measuring tools, taken together, make it clear that today's widespread method of judging schools does more than lead to invalid evaluations. Beyond that, such tests can dramatically lower the quality of education.

An Antidote

Is it possible to build accountability tests that both supply accurate evidence of school quality and promote instructional improvement? The answer is an emphatic yes. In 2001, prior to the enactment of NCLB, an independent national study group, the Commission on Instructionally Supportive Assessment, identified three attributes an "instructionally supportive" accountability test must possess:

A Modest Number of Supersignificant Curricular Aims

To avoid overwhelming teachers and students with daunting lists of curricular targets, an instructionally supportive accountability test should measure students' mastery of only an intellectually manageable number of curricular aims, more like a half-dozen than the 50 or so a teacher may encounter today. However, because fewer curricular benchmarks are to be measured, they must be truly significant.

Lucid Descriptions of Aims

An instructionally helpful test must be accompanied by clear, concise, and teacher-palatable descriptions of each curricular aim to be assessed. With clear descriptions, teachers can direct their instruction toward promoting students' mastery of skills and knowledge rather than toward getting students to come up with correct answers to particular test items.

Instructionally Uuseful Reports

Because an accountability test that supports teaching is focused on only a very limited number of challenging curricular aims, a student's mastery of each subject can be meaningfully measured, letting teachers determine how effective their instruction has been. Students and their parents can also benefit from such informative reports.

These three features can produce an instructionally supportive accountability test that will accurately evaluate schools and improve instruction. The challenge before us, clearly, is how to replace today's instructionally insensitive accountability tests with better ones. Fortunately, at least one state, Wyoming, is now creating its own instructionally supportive NCLB tests. More states should do so.

What You Can Do

If you want to be part of the solution to this situation, it's imperative to learn all you can about educational testing. Then learn some more. For all its importance, educational testing really isn't particularly complicated, because its fundamentals consist of commonsense ideas, not numerical obscurities.

You'll not only understand better what's going on in the current mismeasurement of school quality, you'll also be able to explain it to others. And those others, ideally, will be school board members, legislators, and concerned citizens who might, in turn, make a difference. Simply hop on the Internet or head to your local library and hunt down an introductory book or two about educational assessment. (I've written several such books that, though not as engaging as a crackling good spy thriller, really aren't intimidating.)

With a better understanding of why it is so inane -- and destructive -- to evaluate schools using students' scores on the wrong species of standardized tests, you can persuade anyone who'll listen that policy makers need to make better choices. Our 40-year saga of unsound school evaluation needs to end. Now.

W. James Popham, who began his career in education as a high school teacher in Oregon, is professor emeritus at the University of California at Los Angeles's School of Education and Information Studies. He is the author of 25 books and a former president of the American Educational Research Association.

Comments (47) Sign in or register to comment Follow Subscribe to comments via RSS

Steve's picture
Anonymous (not verified)

I agree with Larry, (the 40-year veteran teacher), about the analysis of our educational system based on our own evaluation. Any judge can put a blue ribbon on his own entry. I recently had the privilege of visiting several schools in Beijing and Bouding China. The schools are designed to support the students who work and who are capable of achieving. They do not cater to the students who are only capable of performing on a Friday night. The Chinese run a very harsh system with the end result being average or below average students relegated to a life of manual labor. However, the alternative is what we seem to have in the U.S. We are not supporting our best and brightest with gifted programs and the product is mediocrity. As Larry suggested, maybe it's time to support academics like we have supported athletics.

Stacy Gutner's picture

That's it right there, you said "proper test taking skills"! I want my child to learn, not to learn how to take a test!!!
We are having that exact problem here in Palm Beach County Florida. This year they implemented a program at a cost of 28 million dollars and they are teaching to the test!! This is so wrong.
Everyone is different and learns differently and they didn't take that into account. My daughter is an ESE student who is learning challenged and will never pass a standardized test, she just won't. She is very bright but this will hold her back instead of her moving forward.

Peter Medveczky's picture

Well, look at real scientific evidence. Read Science 330:335 (October 15, 2010 issue) by MA Pyc and KA Rawson. They present clear evidence that testing is an important part of the learning process. Some of the comments I read hear are pure political correctness. Eliminating testing but do not serve the obvious goal of education; learning!

Peter Medveczky, Prof Univ. S. Florida

Luis Garcia's picture
Luis Garcia
Strategy Consultant for Information Technology Sector and Government

Excellent view! This might sound odd or off, but if I could only ask or suggest to authors to please write shorter articles? Remember how much information is fighting for our attention everywhere, and we have so much people to get up-to speed. I say this because I really care! Thank you!

vjgee's picture
Business Education teacher, 8 to 12s

"Arrogance of educators." Ouch. Are (we) "lagging behind other countries," where, what I understand, is that countries just don't bother with young-for-their-age students, if "they don't want to work" then they become former students, with little formal education, and what I mean to say is that some systems might seem better when what you are doing is comparing apples to oranges--a system can call all their "gifted" their only student population, tell me if they can't, without having ANY "methods and approaches that are superior" (unless you count 'discounting people' a superior method).
"Inter-scholastic sports is a detriment to the educational system," I read something that said hiring-staff will notice sports on a CV, as an indication of a thoughtful, focused, goal-oriented, socially-aware person. "Foreign schools do not support ... sports" while, apples and oranges again, Malaysia might not have the financial resources to provide much in the way of sports, Australia and the UK play rugby, cricket, just not grid-iron, while what I mean to say is I don't think sports is a waste in education. The Netherlands does well in soccer "though you will not find such sports activities in the schools," I'll take your word for it, but when do they play it then? they go to school until 6 in the evening, or is it sport from 4 until 6? and not school? Apples and oranges.
"A great change is needed," gawd-luv-ya but does it have to all come from educators so long "ignored"? In my estimation my students love school and I can't see how chucking the whole thing for change will make them love it any more. If I could spend more time with the 24 regulars and less with the 6 special needs students in my room, there might be more people better off--but while "public school is that last meeting place of the community" we'll be fine, just don't push that envelope any further. Meantime a UBC physics teacher suggested that publically educated students seem to fair better at university simply because they were not tutored and prodded. They have learned that so much was Tuum Est, up to you, during school, that they didn't look around for help, "just open the door and I'll get it myself."
The profession is amazing. It's difficult on the front-lines, but it's not something that needs fixing, whiles what is on offer is arm-chair advice. We'll do the job with support teachers, ESL specialists, salt-of-the-earth coaches, committed parents and administration, and kids that are keen and good or even otherwise. Just make sure we have got what we need and get out of the way.

James Mulhern's picture

Many realize that the testing frenzy is out of control. Student education and morale are sadly diminished. Even more disconcerting is the poor quality of some of the Training Tests for these "high stakes" assessments. Check out my post on to read an analysis of the FSA for ELA Training Tests. This testing vehicle is riddled with egregious mistakes. A sad commentary on the state of education when the vehicles for improving education have such poor editorial quality.

MarySmith's picture

One of the problems with standardized testing may be just a simple lack of experience. Students are often confused when they face with cognate or controversial variants of the answer. I think that the educational system should pay attention to preparation for tests. One of the ways - is quizzes, students may test themselves according to the studied material not in a boring way.

tfriesen14's picture

Standardized testing has long been a part of the education system. It has been deemed the most valuable way to evaluate educators and students. The test combines basic information that students should know by the time they reach a particular grade level. Standardized testing is also used at the higher education level to help universities determine which students will have the greatest success at their institution. Standardized testing is such a large aspect of the current education system that teachers are practically forced to teach students how to take tests rather than teaching content crucial to student success. Standardized testing is an outdated form of assessment and should be eliminated. Students have so many different learning styles, and one test does not accommodate. Standardized testing does not consider English Language Learners and does not assess their growth throughout the year. Education should be intentional and offer a scaffold for learners. Educators should offer assistance to struggling students. Standardized testing does not allow an educator the resources or supports needed to aid a struggling student. Educators should not be teaching standards just to achieve the best scores; they should be teaching in order to allow students to receive the best education.

jasminealison's picture

Although students may be taking the same test, they all won't be prepared in the same exact way. Some students might be having trouble at home, which would affect their performance on the exam. Others may get stressed and nervous. Exams are longer and more comprehensive versions of tests. I believe exams should be abolished in schools because they waste time in school and they give students anxiety.
A reason schools should not require exams is because they waste students time. In, 'For and against standardized tests: Two student perpectives,' Joshua Palackal states, "... 44 percent of schools in the United States are spending more time on reading and math." This isn't good because students are spending less time on other subjects such as social studies, science, and the arts. Learning time in school is being replaced with test preparation, defeating the actual purpose of school, which is to learn.
Another reason exams should be banned in schools is because they give students anxiety. Education researcher Gregory J. Cizek in 'Unintended Consequences of High Stakes Testing - P-12,' states, "...illustrating how testing... produces gripping anxiety in even the brightest students, and makes young children vomit or cry, or both." Students should not being facing this kind of apprehension due to a test. Anxiety in students before, during, and after testing can negatively affect their score and grades in school. They'll be nervous to take the test, because they may feel unprepared, and will be nervous after when waiting for their score.
Exams in schools have negative outcomes and effects that badly impact the students taking them. I believe banning exams will decrease anxiety within students and allow them to learn more than what they are already learning.

Sign in to comment. Not a member? Register.