Standardized Testing Fails the Exam | Edutopia
Edutopia on Facebook
Edutopia on Twitter
Edutopia on Google+
Edutopia on Pinterest Follow Me on Pinterest
WHAT WORKS IN EDUCATION The George Lucas Educational Foundation

Standardized Testing Fails the Exam

If enough educators -- and noneducators -- realize there are serious flaws in how we evaluate our schools, maybe we can stop this absurdity.
By W. James Popham
  • Facebook
  • Twitter
  • Pinterest
  • Share
Credit: Veer/James Godman

For the last four decades, students' scores on standardized tests have increasingly been regarded as the most meaningful evidence for evaluating U.S. schools. Most Americans, indeed, believe that students' standardized test performances are the only legitimate indicator of a school's instructional effectiveness.

Yet, although test-based evaluations of schools seem to occur almost as often as fire drills, in most instances these evaluations are inaccurate. That's because the standardized tests employed are flat-out wrong.

Standardized tests have been used to evaluate America's schools since 1965, when the U.S. Elementary and Secondary Education Act became law. That statute provided for the first major infusion of federal funds into local schools and required educators to produce test-based evidence that ESEA dollars were well spent.

But how, you might ask, could a practice that's been so prevalent for so long be mistaken? Just think back to the many years we forced airline attendants and nonsmoking passengers to suck in secondhand toxins because smoking on airliners was prohibited only during takeoff and landing.

Some screw-ups can linger for a long time. But mistakes, even ones we've lived with for decades, can often be corrected once they've been identified, and that's what we must do to halt today's wrongheaded school evaluations. If enough educators -- and noneducators -- realize that there are serious flaws in the way we evaluate our schools, and that those flaws erode educational quality, there's a chance we can stop this absurdity.

Instructionally Insensitive

First, some definitions:

A standardized test is any test that's administered, scored, and interpreted in a standard, predetermined manner. Standardized aptitude tests are designed to make predictions about how a test taker will perform in a subsequent setting. For example, the SAT and the ACT are used to predict the grades that high school students will earn when they get to college. By contrast, standardized achievement tests indicate how well a test taker has acquired knowledge and mastered certain skills.

Although students' scores on standardized aptitude tests are sometimes unwisely stirred into the school-evaluation stew, scores on standardized achievement tests are typically the ones used to judge a school's success. Two kinds of standardized achievement tests commonly used for school evaluations are ill suited for that measurement.

The first of these categories are nationally standardized achievement tests like the Iowa Tests of Basic Skills, which employ a comparative measurement strategy. The fundamental purpose of all such tests is to compare a student's score with the scores earned by a previous group of test takers (known as the norm group). It can then be determined if Johnny scored at the 95th percentile on a given test (attaboy!) or at the tenth percentile (son, we have a problem).

Because of the need for nationally standardized achievement tests to provide fine-grained, percentile-by-percentile comparisons, it is imperative that these tests produce a considerable degree of score spread -- in other words, plenty of differences among test takers' scores. So, producing score spread often preoccupies those who construct standardized achievement tests.

Statistically, a question that creates the most score spread on standardized achievement tests is one that only about half the students answer correctly. Over the years, developers of standardized achievement tests have learned that if they can link students' success on a question to students' socioeconomic status (SES), then about half of the test takers usually answer that item correctly. If an item is answered correctly more often by students at the upper end of the socioeconomic scale than by lower-SES kids, that question will provide plenty of score spread.

After all, SES is a delightfully spread-out variable and one that isn't quickly altered. As a result, in today's nationally standardized achievement tests, there are many SES-linked items.

Unfortunately, this kind of test tends to measure not what students have been taught in school but what they bring to school. That's the reason there's such a strong relationship between a school's standardized-test scores and the economic and social makeup of that school's student body.

As a consequence, most nationally standardized achievement tests end up being instructionally insensitive. That is, they're unable to detect improved instruction in a school even when it has definitely taken place. Because of this insensitivity, when students' scores on such tests are used to evaluate a school's instructional performance, that evaluation usually misses the mark.

A second kind of instructionally insensitive test is the sort of standardized achievement test that many states have developed for accountability during the past two decades. Such tests have typically been created to better assess students' mastery of the officially approved skills and knowledge. Those skills and knowledge, sometimes referred to as goals or curricular aims, are usually known these days as content standards. Thus, such state-developed standardized assessments -- like the Florida Comprehensive Assessment Test (FCAT) -- are frequently described as standards-based tests.

Because these customized standards-based tests were designed (almost always with the assistance of an external test-development contractor) to be aligned with a state's curricular aspirations, it would seem that they would be ideal for appraising a school's quality. Unfortunately, that's not the way it works out.

When a state's education officials decide to identify the skills and knowledge students should master, the typical procedure for doing so hinges on the recommendations of subject-matter specialists from that state. For example, if authorities in Ohio or New Mexico want to identify their state's official content standards for mathematics, then a group of, say, 30 math teachers, math-curriculum consultants, and university math professors are invited to form a statewide content-standards committee.

Typically, when these committees attempt to identify the skills and knowledge students should master, their recommendation -- not surprisingly -- is that they should master everything. These committees seem bent on identifying skills they fervently wish students would possess. Regrettably, the resultant litanies of committee-chosen content standards tend to resemble curricular wish lists rather than realistic targets.

Whether or not the targets make sense, there tend to be a lot of them, and the effect is counterproductive. A state's standards-based tests are intended to evaluate schools based on students' test performances, but teachers soon become overwhelmed by too many targets. Educators must guess about which of this multitude of content standards will actually be assessed on a given year's test. Moreover, because there are so many content standards to be assessed and only limited testing time, it is impossible to report any meaningful results about which content standards have and haven't been mastered.

After working with standards-based tests aimed at so many targets, teachers understandably may devote less and less attention to those tests. As a consequence, students' performances on this type of instructionally insensitive test often become dependent on the very same SES factors that compromise the utility of nationally standardized achievement tests when used for school evaluation.

Wrong Tests, Wrong Consequences

Bad things happen when schools are evaluated using either of these two types of instructionally insensitive tests. This is particularly true when the importance of a school evaluation is substantial, as it is now. All of the nation's public schools are evaluated annually under the provisions of the federal No Child Left Behind Act.

Not only are the results of the NCLB school-by-school evaluations widely disseminated, there are also penalties for schools that receive NCLB funds yet fail to make sufficient test-based progress. These schools are placed on an improvement track that can soon "improve" them into nonexistence. Educators in America's public schools obviously are under tremendous pressure to improve their students' scores on whatever NCLB tests their state has chosen.

With few exceptions, however, the assessments states have chosen to implement because of NCLB are either nationally standardized achievement tests or state-developed standards-based tests -- both of which are flawed. Here, then, are three adverse classroom consequences seen in states where instructionally insensitive NCLB tests are used:

Curricular Reductionism

In an effort to boost their students' NCLB test scores, many teachers jettison curricular content that -- albeit important -- is not apt to be covered on an upcoming test. As a result, students end up educationally shortchanged.

Excessive Drilling

Because it is essentially impossible to raise students' scores on instructionally insensitive tests, many teachers -- in desperation -- require seemingly endless practice with items similar to those on an approaching accountability test. This dreary drilling often stamps out any genuine joy students might (and should) experience while they learn.

Modeled Dishonesty

Some teachers, frustrated by being asked to raise scores on tests deliberately designed to preclude such score raising, may be tempted to adopt unethical practices during the administration or scoring of accountability tests. Students learn that whenever the stakes are high enough, the teacher thinks it's OK to cheat. This is a lesson that should never be taught.

These three negative consequences of using instructionally insensitive standardized tests as measuring tools, taken together, make it clear that today's widespread method of judging schools does more than lead to invalid evaluations. Beyond that, such tests can dramatically lower the quality of education.

An Antidote

Is it possible to build accountability tests that both supply accurate evidence of school quality and promote instructional improvement? The answer is an emphatic yes. In 2001, prior to the enactment of NCLB, an independent national study group, the Commission on Instructionally Supportive Assessment, identified three attributes an "instructionally supportive" accountability test must possess:

A Modest Number of Supersignificant Curricular Aims

To avoid overwhelming teachers and students with daunting lists of curricular targets, an instructionally supportive accountability test should measure students' mastery of only an intellectually manageable number of curricular aims, more like a half-dozen than the 50 or so a teacher may encounter today. However, because fewer curricular benchmarks are to be measured, they must be truly significant.

Lucid Descriptions of Aims

An instructionally helpful test must be accompanied by clear, concise, and teacher-palatable descriptions of each curricular aim to be assessed. With clear descriptions, teachers can direct their instruction toward promoting students' mastery of skills and knowledge rather than toward getting students to come up with correct answers to particular test items.

Instructionally Uuseful Reports

Because an accountability test that supports teaching is focused on only a very limited number of challenging curricular aims, a student's mastery of each subject can be meaningfully measured, letting teachers determine how effective their instruction has been. Students and their parents can also benefit from such informative reports.

These three features can produce an instructionally supportive accountability test that will accurately evaluate schools and improve instruction. The challenge before us, clearly, is how to replace today's instructionally insensitive accountability tests with better ones. Fortunately, at least one state, Wyoming, is now creating its own instructionally supportive NCLB tests. More states should do so.

What You Can Do

If you want to be part of the solution to this situation, it's imperative to learn all you can about educational testing. Then learn some more. For all its importance, educational testing really isn't particularly complicated, because its fundamentals consist of commonsense ideas, not numerical obscurities.

You'll not only understand better what's going on in the current mismeasurement of school quality, you'll also be able to explain it to others. And those others, ideally, will be school board members, legislators, and concerned citizens who might, in turn, make a difference. Simply hop on the Internet or head to your local library and hunt down an introductory book or two about educational assessment. (I've written several such books that, though not as engaging as a crackling good spy thriller, really aren't intimidating.)

With a better understanding of why it is so inane -- and destructive -- to evaluate schools using students' scores on the wrong species of standardized tests, you can persuade anyone who'll listen that policy makers need to make better choices. Our 40-year saga of unsound school evaluation needs to end. Now.

W. James Popham, who began his career in education as a high school teacher in Oregon, is professor emeritus at the University of California at Los Angeles's School of Education and Information Studies. He is the author of 25 books and a former president of the American Educational Research Association.

Comments (43)Sign in or register to postSubscribe to comments via RSS

Renee H.'s picture
Anonymous (not verified)

I had to visit this site for a graduate class, but I am so glad I did.

The title "F for Assessment" is so accurate. I teach in the state of New Jersey and I was transferred from 6th grade to 4th when we began state testing a decade ago. I knew this was the beginning of the end of creativity, and learning communities. It was the beginning of a test dominated curriculum.

I stated to colleagues if my name accompanies my students on this test every teacher including their first teachers, their parents should be on it also. They thought I was trying to be humorous, but I knew there were a lot of factors that contribute to the success and failure for my students, and a test could not diagnose and measure these factors.

I am spending my precious and limited instructional time preparing for a 1 week assessment. It is not fair to the students, parents, and the teaching profession. I have had student improve their ASK 4 score by 15-20 points, but their score is considered non proficient.The state should assess individual educational growth instead of comparing our special education and ELL students to student with more academic resources and skills.

When NCLB began many educators said it wouldn't last, but it is still here. And by 2012 all of my student should be proficient and pass our state testing. If the presidents can empty 100% the jails, balance 100% of the budget, and provide healthcare and employment for 100% of all Americans, then I can teach for 6 months and make 100% of my students pass this test.

Renee H's picture
Anonymous (not verified)

J. Roberts,

I agree we do need some form of assessment, but I feel state assessment is not about student achievement, but politics.

Testing a child for a week is too long. I like your suggestion of assessing a few key concepts that are built upon each year.


Dorothea's picture
Anonymous (not verified)

The success or failure of a school district should not be determined by one test. My district's scores are adversely impacted by two sub groups ESL students, and special education. If English is not your primary language perhaps the test should be administered in your native tongue. If a special education student is 3-4 years below grade level, how can the student score in the proficient range? My district has disproportionate numbers of each population. It is unfair to compare school districts that do not share similar dynamics, If AYP is not reached this year a state takeover is imminent for my school system.

Brandi's picture
Anonymous (not verified)

I just wanted to say that I completely agree with you. I really don't understand the purpose of focusing solely on standardized testing. It is very frustrating because everyone is forgetting students need skills to help them succeed in life. When our students go out into the real world and get a job, they are going to lack the skills they need. I find it ridiculous that those of us that know what is best for our students are not the ones making these crazy laws like NCLB and adding to the increasing pressures of standardized testing. It would be nice for teachers to be able to help make these bigger decisions for the students we teach. After all, it seems we are the only ones who know what is really best for them.

Anonymous's picture
Anonymous (not verified)

I understand the reasons behind state assessments and the need to hold schools accountable, but process needs a complete overhaul. As a fifth grade teacher I have seen countless students, parents and colleages suffer from stress and anxiety regarding the state tests. I have one hour a day with my students, and the state test is a "gradeband" test (meaning it covers 3rd, 4th, and 5th grade material). If my students don't pass, the finger is pointed at me, regardless of the education they received over the previous two years. Most frustrating is the time-table put on the test. The test is administered on one day out of the 180 we are in school regardless of each child's individual concerns or needs. Johnny was up all night due to his parents' arguing. Suzie can't concentrate because her dog died yesterday and she can't stop crying. Mike can't focus or even sit in his chair for five minutes (let alone for a two hour test) because his ADD meds have run out and Mom can't afford to get more until next payday. The test doesn't care. Their entire education is boiled down to how they do on one test, on one particular day.

Christine's picture
Anonymous (not verified)

I do not disagree that testing could be beneficial if the test was a fair and accurate account of student knowledge. It seems that our state is trying to improve standards, however, with each state test we have developed a pattern to "curve" the results. With the curve, the test is no longer a valuable assessment. Students think that they know more of the information than they actually do. Parents, administrators, and politicians think that we are doing a wonderful job in educating society when in actuality, the politicians are doing a wonderful job pulling the wool over the publics eyes.

Greg Collins's picture
Anonymous (not verified)

I agree that our current testing system is flawed in several ways. I am a middle school math teacher in Georgia. The math problems on the CRCT require higher order thinking skills. This means that most of the questions are word problems. I have had students that were competent math students but they did not read on grade level. This means that they were likely to incorrectly answer math problems based on their reading abilities instead of their math abilities.
I have not seen or heard of this problem being addressed.

Sabrina's picture
Anonymous (not verified)

I have been painfully aware of the issues that standardized testing has caused in the classroom. I am entering my third year teaching third grade and have spent the past two years teaching reading and math almost exclusively. Social studies and science are almost completely ignored in lieu of test prep and practice. We devote so much time to test taking skills and practice answering test questions that we really don't have much time to teach the other content areas outside of reading and math. For the longest time I thought that my school was unique in this way, but I am learning that we are just like other schools all over the country. It is a sad fact that our students are missing out on so much information because they are subjected to test prep on a regular basis. I hope that our educational priorities return to truly educating our students in all the curricular areas, and I hope this change happens sooner rather than later. I am currently pursuing my Master's degree in reading, but this article has sparked an interest in assessment. I might be investigating and re-evaluating my program choice.

Brian's picture
Anonymous (not verified)

Thank you for the clear explanation of how current standard tests measure too much of the wrong things. Garbage in, garbage out.

Discussions in the media do not make it clear what the problem is. They often make it sound like teachers are afraid to have their students be tested at all, and to have any accountability at all.

Tests that match real-world cases clearly make the most sense. In statistics courses I've taught, most (not all) of each exam is open book, since that's the way it's really done. (A student who has avoided learning the material won't have enough time to learn it during the exam, even with their book.)

Grades are based on material mastered, not comparison with other students. If almost everyone learns it really well, that's great.

Karl's picture
Anonymous (not verified)

Precision and accuracy are part of middle school science curriculum. Yet, they are not part of the discussion about standardized testing. How can we expect our 7th graders to understand precision and accuracy, when we evaluate them using tests that are neither precise nor accurate, but we report the results as if they were both accurate and precise?

May I reference a general discussion of testing reliability: and a supporting example of a test that shows very low precision: Testing should guide teachers. Instead most tests limit both teachers and students.

Sign in to comment. Not a member? Register.