Facebook
Edutopia on Facebook
Twitter
Edutopia on Twitter
Google+
Edutopia on Google+
Pinterest
Edutopia on Pinterest Follow Me on Pinterest
WHAT WORKS IN EDUCATION The George Lucas Educational Foundation

Standardized Testing Fails the Exam

If enough educators -- and noneducators -- realize there are serious flaws in how we evaluate our schools, maybe we can stop this absurdity.
By W. James Popham
Credit: Veer/James Godman

For the last four decades, students' scores on standardized tests have increasingly been regarded as the most meaningful evidence for evaluating U.S. schools. Most Americans, indeed, believe that students' standardized test performances are the only legitimate indicator of a school's instructional effectiveness.

Yet, although test-based evaluations of schools seem to occur almost as often as fire drills, in most instances these evaluations are inaccurate. That's because the standardized tests employed are flat-out wrong.

Standardized tests have been used to evaluate America's schools since 1965, when the U.S. Elementary and Secondary Education Act became law. That statute provided for the first major infusion of federal funds into local schools and required educators to produce test-based evidence that ESEA dollars were well spent.

But how, you might ask, could a practice that's been so prevalent for so long be mistaken? Just think back to the many years we forced airline attendants and nonsmoking passengers to suck in secondhand toxins because smoking on airliners was prohibited only during takeoff and landing.

Some screw-ups can linger for a long time. But mistakes, even ones we've lived with for decades, can often be corrected once they've been identified, and that's what we must do to halt today's wrongheaded school evaluations. If enough educators -- and noneducators -- realize that there are serious flaws in the way we evaluate our schools, and that those flaws erode educational quality, there's a chance we can stop this absurdity.

Instructionally Insensitive

First, some definitions:

A standardized test is any test that's administered, scored, and interpreted in a standard, predetermined manner. Standardized aptitude tests are designed to make predictions about how a test taker will perform in a subsequent setting. For example, the SAT and the ACT are used to predict the grades that high school students will earn when they get to college. By contrast, standardized achievement tests indicate how well a test taker has acquired knowledge and mastered certain skills.

Although students' scores on standardized aptitude tests are sometimes unwisely stirred into the school-evaluation stew, scores on standardized achievement tests are typically the ones used to judge a school's success. Two kinds of standardized achievement tests commonly used for school evaluations are ill suited for that measurement.

The first of these categories are nationally standardized achievement tests like the Iowa Tests of Basic Skills, which employ a comparative measurement strategy. The fundamental purpose of all such tests is to compare a student's score with the scores earned by a previous group of test takers (known as the norm group). It can then be determined if Johnny scored at the 95th percentile on a given test (attaboy!) or at the tenth percentile (son, we have a problem).

Because of the need for nationally standardized achievement tests to provide fine-grained, percentile-by-percentile comparisons, it is imperative that these tests produce a considerable degree of score spread -- in other words, plenty of differences among test takers' scores. So, producing score spread often preoccupies those who construct standardized achievement tests.

Statistically, a question that creates the most score spread on standardized achievement tests is one that only about half the students answer correctly. Over the years, developers of standardized achievement tests have learned that if they can link students' success on a question to students' socioeconomic status (SES), then about half of the test takers usually answer that item correctly. If an item is answered correctly more often by students at the upper end of the socioeconomic scale than by lower-SES kids, that question will provide plenty of score spread.

After all, SES is a delightfully spread-out variable and one that isn't quickly altered. As a result, in today's nationally standardized achievement tests, there are many SES-linked items.

Unfortunately, this kind of test tends to measure not what students have been taught in school but what they bring to school. That's the reason there's such a strong relationship between a school's standardized-test scores and the economic and social makeup of that school's student body.

As a consequence, most nationally standardized achievement tests end up being instructionally insensitive. That is, they're unable to detect improved instruction in a school even when it has definitely taken place. Because of this insensitivity, when students' scores on such tests are used to evaluate a school's instructional performance, that evaluation usually misses the mark.

A second kind of instructionally insensitive test is the sort of standardized achievement test that many states have developed for accountability during the past two decades. Such tests have typically been created to better assess students' mastery of the officially approved skills and knowledge. Those skills and knowledge, sometimes referred to as goals or curricular aims, are usually known these days as content standards. Thus, such state-developed standardized assessments -- like the Florida Comprehensive Assessment Test (FCAT) -- are frequently described as standards-based tests.

Because these customized standards-based tests were designed (almost always with the assistance of an external test-development contractor) to be aligned with a state's curricular aspirations, it would seem that they would be ideal for appraising a school's quality. Unfortunately, that's not the way it works out.

When a state's education officials decide to identify the skills and knowledge students should master, the typical procedure for doing so hinges on the recommendations of subject-matter specialists from that state. For example, if authorities in Ohio or New Mexico want to identify their state's official content standards for mathematics, then a group of, say, 30 math teachers, math-curriculum consultants, and university math professors are invited to form a statewide content-standards committee.

Typically, when these committees attempt to identify the skills and knowledge students should master, their recommendation -- not surprisingly -- is that they should master everything. These committees seem bent on identifying skills they fervently wish students would possess. Regrettably, the resultant litanies of committee-chosen content standards tend to resemble curricular wish lists rather than realistic targets.

Whether or not the targets make sense, there tend to be a lot of them, and the effect is counterproductive. A state's standards-based tests are intended to evaluate schools based on students' test performances, but teachers soon become overwhelmed by too many targets. Educators must guess about which of this multitude of content standards will actually be assessed on a given year's test. Moreover, because there are so many content standards to be assessed and only limited testing time, it is impossible to report any meaningful results about which content standards have and haven't been mastered.

After working with standards-based tests aimed at so many targets, teachers understandably may devote less and less attention to those tests. As a consequence, students' performances on this type of instructionally insensitive test often become dependent on the very same SES factors that compromise the utility of nationally standardized achievement tests when used for school evaluation.

Wrong Tests, Wrong Consequences

Bad things happen when schools are evaluated using either of these two types of instructionally insensitive tests. This is particularly true when the importance of a school evaluation is substantial, as it is now. All of the nation's public schools are evaluated annually under the provisions of the federal No Child Left Behind Act.

Not only are the results of the NCLB school-by-school evaluations widely disseminated, there are also penalties for schools that receive NCLB funds yet fail to make sufficient test-based progress. These schools are placed on an improvement track that can soon "improve" them into nonexistence. Educators in America's public schools obviously are under tremendous pressure to improve their students' scores on whatever NCLB tests their state has chosen.

With few exceptions, however, the assessments states have chosen to implement because of NCLB are either nationally standardized achievement tests or state-developed standards-based tests -- both of which are flawed. Here, then, are three adverse classroom consequences seen in states where instructionally insensitive NCLB tests are used:

Curricular Reductionism

In an effort to boost their students' NCLB test scores, many teachers jettison curricular content that -- albeit important -- is not apt to be covered on an upcoming test. As a result, students end up educationally shortchanged.

Excessive Drilling

Because it is essentially impossible to raise students' scores on instructionally insensitive tests, many teachers -- in desperation -- require seemingly endless practice with items similar to those on an approaching accountability test. This dreary drilling often stamps out any genuine joy students might (and should) experience while they learn.

Modeled Dishonesty

Some teachers, frustrated by being asked to raise scores on tests deliberately designed to preclude such score raising, may be tempted to adopt unethical practices during the administration or scoring of accountability tests. Students learn that whenever the stakes are high enough, the teacher thinks it's OK to cheat. This is a lesson that should never be taught.

These three negative consequences of using instructionally insensitive standardized tests as measuring tools, taken together, make it clear that today's widespread method of judging schools does more than lead to invalid evaluations. Beyond that, such tests can dramatically lower the quality of education.

An Antidote

Is it possible to build accountability tests that both supply accurate evidence of school quality and promote instructional improvement? The answer is an emphatic yes. In 2001, prior to the enactment of NCLB, an independent national study group, the Commission on Instructionally Supportive Assessment, identified three attributes an "instructionally supportive" accountability test must possess:

A Modest Number of Supersignificant Curricular Aims

To avoid overwhelming teachers and students with daunting lists of curricular targets, an instructionally supportive accountability test should measure students' mastery of only an intellectually manageable number of curricular aims, more like a half-dozen than the 50 or so a teacher may encounter today. However, because fewer curricular benchmarks are to be measured, they must be truly significant.

Lucid Descriptions of Aims

An instructionally helpful test must be accompanied by clear, concise, and teacher-palatable descriptions of each curricular aim to be assessed. With clear descriptions, teachers can direct their instruction toward promoting students' mastery of skills and knowledge rather than toward getting students to come up with correct answers to particular test items.

Instructionally Uuseful Reports

Because an accountability test that supports teaching is focused on only a very limited number of challenging curricular aims, a student's mastery of each subject can be meaningfully measured, letting teachers determine how effective their instruction has been. Students and their parents can also benefit from such informative reports.

These three features can produce an instructionally supportive accountability test that will accurately evaluate schools and improve instruction. The challenge before us, clearly, is how to replace today's instructionally insensitive accountability tests with better ones. Fortunately, at least one state, Wyoming, is now creating its own instructionally supportive NCLB tests. More states should do so.

What You Can Do

If you want to be part of the solution to this situation, it's imperative to learn all you can about educational testing. Then learn some more. For all its importance, educational testing really isn't particularly complicated, because its fundamentals consist of commonsense ideas, not numerical obscurities.

You'll not only understand better what's going on in the current mismeasurement of school quality, you'll also be able to explain it to others. And those others, ideally, will be school board members, legislators, and concerned citizens who might, in turn, make a difference. Simply hop on the Internet or head to your local library and hunt down an introductory book or two about educational assessment. (I've written several such books that, though not as engaging as a crackling good spy thriller, really aren't intimidating.)

With a better understanding of why it is so inane -- and destructive -- to evaluate schools using students' scores on the wrong species of standardized tests, you can persuade anyone who'll listen that policy makers need to make better choices. Our 40-year saga of unsound school evaluation needs to end. Now.

W. James Popham, who began his career in education as a high school teacher in Oregon, is professor emeritus at the University of California at Los Angeles's School of Education and Information Studies. He is the author of 25 books and a former president of the American Educational Research Association.

Comments (42)Sign in or register to postSubscribe to comments via RSS

Anonymous's picture
Anonymous (not verified)

we scould have testing its a good thing to have in schools so lets keep having it !!!! one reason is teaches will have a hard time putting kids in class so thats why im writing here so WE NEED TESTING!!!!!!!!!!

Anonymous's picture
Anonymous (not verified)

WE NEED TSTING!!!!!!!!!!!!!!!!!

kids scould have testing because its easeir to put kids in classes and to see how they do and if they pass what ever grade their in and if they cant to that than what will they do!!!????

meagan thompson !!!

Mallory's picture
Anonymous (not verified)

yo yo yo Standardized tests are great! We Need them! Im in 5th grade and I am doing a persuasive essay on Yes Standardized tests should be used to assess students ahievements,Do you guys have any facts or opinions for me????

Anonymous's picture
Anonymous (not verified)

I strongly agree with you. Some of these tests got to go. For instance the
FCAT I think is a real problem. Teachers are no longer teaching their classes. They are all working on FCAT materials. It is more like a competition between schools. Not only we have the FCAT, we also have many different types of test. The children are stressed and some of them even pass out on the test because the pressure is too much.

Tina's picture
Anonymous (not verified)

I believe that due to the NCLB act and to school accountability,teachers are tempted to practice unethical testing proceedures. They teach to the test and leave out of the curriculum a lot of valuable life lesson.
I also feel that the tests do not accurately show what the student can actually achieve. The are many people who have test anxiety and therefore do not do well on standardized tests. There needs to be a better way to decide what a student can accomplish other than standardize testing.
In my district, the third-fifth grade students are given three different tests during the year and a state standardized test at the end of the year. My kindergarteners were not allowed to go to computer lab because of all the testing that was being administered.

Terry's picture
Anonymous (not verified)

Hello Michell,
I share your concern that students are slowly losing the joy of learning. There isn't much joy in test taking strategies, drill and practice tests. I also feel that teacher morale is negatively affected by the high stakes assessments. I have worked with beginning teachers and have seen them lose their enthusiasm within the first few years. The realities of the pressure to raise test scores is something that beginning teachers are not prepared for in college course work. I hope that revisions to "No Child Left Behind" will be made to bring back both the joy of teaching, and the joy of learning.

Anonymous's picture
Anonymous (not verified)

I strongly agree with this James Popham in his arguments both for and against standardized testing. As a classroom teacher, I know how hard it is to delve deeply into the content, when you know that there are only so many days until "the big test". I do believe that assessment can be a huge asset to the classroom and to our educational system, but also believe that it needs to be done in the right way and with a strong purpose. I look forward to a further development of my own knowledge in this area of education.

Tracy's picture
Anonymous (not verified)

I am an intervention specialist in grades first and second. Each year I am faced with administering achievement tests and statewide diagnostic tests to my students with special needs. It is frustrating to see them struggle with these tests. My students are allowed only certain accomodations (i.e. small group, extended time, questions read aloud, etc.). Many of my students are functioning one or two grade levels below their current grade. Therefore, they are asked to complete tests well above their academic abilities. It is like presenting a recipe in a foreign language unknown to someone and asking them to make the recipe. How are these tests differentiated to meet the needs of all the students? It is so frustrating for the students - it makes them feel even more behind and unsuccessful! I believe in assessing students based on their performances and achievements.

j.roberts's picture
Anonymous (not verified)

I agree that standardized test are flawed, but I feel that there should be something to measure the learning of students. One reason that I think they are flaw is because the curriculums taught require students to learn a lot of diffenent concepts in one year. I think curriculums should focus on 2-3 concepts a years, starting in 1st grade. By the time the students reach 6th grade they would have master all basic math and reading skills.

Anonymous's picture
Anonymous (not verified)

On those who think testing should be done. Your reasons are ridiculous. Kids aren't tracked by ability and even if a student scores low on an achievenment test they aren't held back. WHy not study a bit more and READ about assessments before you make stupid comments that support the "herd" view of testing without knowing the real facts.

blog A New Era for Student Assessment

Last comment 1 week 12 hours ago in Performance Assessment

Discussion Completion Grading for Homework

Last comment 1 week 1 day ago in Assessment

Discussion Teachers Throwing Out Grades

Last comment 1 week 1 day ago in Assessment

blog Debunking Homework Myths

Last comment 1 week 1 day ago in Assessment

Discussion What do we mean by effective marking? And how do we keep this manageable?

Last comment 3 weeks 3 days ago in Assessment

Sign in and Join the Discussion! Not a member? Register to join the discussion.