Facebook
Edutopia on Facebook
Twitter
Edutopia on Twitter
Google+
Edutopia on Google+
Pinterest
Edutopia on Pinterest Follow Me on Pinterest
WHAT WORKS IN EDUCATION The George Lucas Educational Foundation

Standardized Testing Fails the Exam

If enough educators -- and noneducators -- realize there are serious flaws in how we evaluate our schools, maybe we can stop this absurdity.
By W. James Popham
Credit: Veer/James Godman

For the last four decades, students' scores on standardized tests have increasingly been regarded as the most meaningful evidence for evaluating U.S. schools. Most Americans, indeed, believe that students' standardized test performances are the only legitimate indicator of a school's instructional effectiveness.

Yet, although test-based evaluations of schools seem to occur almost as often as fire drills, in most instances these evaluations are inaccurate. That's because the standardized tests employed are flat-out wrong.

Standardized tests have been used to evaluate America's schools since 1965, when the U.S. Elementary and Secondary Education Act became law. That statute provided for the first major infusion of federal funds into local schools and required educators to produce test-based evidence that ESEA dollars were well spent.

But how, you might ask, could a practice that's been so prevalent for so long be mistaken? Just think back to the many years we forced airline attendants and nonsmoking passengers to suck in secondhand toxins because smoking on airliners was prohibited only during takeoff and landing.

Some screw-ups can linger for a long time. But mistakes, even ones we've lived with for decades, can often be corrected once they've been identified, and that's what we must do to halt today's wrongheaded school evaluations. If enough educators -- and noneducators -- realize that there are serious flaws in the way we evaluate our schools, and that those flaws erode educational quality, there's a chance we can stop this absurdity.

Instructionally Insensitive

First, some definitions:

A standardized test is any test that's administered, scored, and interpreted in a standard, predetermined manner. Standardized aptitude tests are designed to make predictions about how a test taker will perform in a subsequent setting. For example, the SAT and the ACT are used to predict the grades that high school students will earn when they get to college. By contrast, standardized achievement tests indicate how well a test taker has acquired knowledge and mastered certain skills.

Although students' scores on standardized aptitude tests are sometimes unwisely stirred into the school-evaluation stew, scores on standardized achievement tests are typically the ones used to judge a school's success. Two kinds of standardized achievement tests commonly used for school evaluations are ill suited for that measurement.

The first of these categories are nationally standardized achievement tests like the Iowa Tests of Basic Skills, which employ a comparative measurement strategy. The fundamental purpose of all such tests is to compare a student's score with the scores earned by a previous group of test takers (known as the norm group). It can then be determined if Johnny scored at the 95th percentile on a given test (attaboy!) or at the tenth percentile (son, we have a problem).

Because of the need for nationally standardized achievement tests to provide fine-grained, percentile-by-percentile comparisons, it is imperative that these tests produce a considerable degree of score spread -- in other words, plenty of differences among test takers' scores. So, producing score spread often preoccupies those who construct standardized achievement tests.

Statistically, a question that creates the most score spread on standardized achievement tests is one that only about half the students answer correctly. Over the years, developers of standardized achievement tests have learned that if they can link students' success on a question to students' socioeconomic status (SES), then about half of the test takers usually answer that item correctly. If an item is answered correctly more often by students at the upper end of the socioeconomic scale than by lower-SES kids, that question will provide plenty of score spread.

After all, SES is a delightfully spread-out variable and one that isn't quickly altered. As a result, in today's nationally standardized achievement tests, there are many SES-linked items.

Unfortunately, this kind of test tends to measure not what students have been taught in school but what they bring to school. That's the reason there's such a strong relationship between a school's standardized-test scores and the economic and social makeup of that school's student body.

As a consequence, most nationally standardized achievement tests end up being instructionally insensitive. That is, they're unable to detect improved instruction in a school even when it has definitely taken place. Because of this insensitivity, when students' scores on such tests are used to evaluate a school's instructional performance, that evaluation usually misses the mark.

A second kind of instructionally insensitive test is the sort of standardized achievement test that many states have developed for accountability during the past two decades. Such tests have typically been created to better assess students' mastery of the officially approved skills and knowledge. Those skills and knowledge, sometimes referred to as goals or curricular aims, are usually known these days as content standards. Thus, such state-developed standardized assessments -- like the Florida Comprehensive Assessment Test (FCAT) -- are frequently described as standards-based tests.

Because these customized standards-based tests were designed (almost always with the assistance of an external test-development contractor) to be aligned with a state's curricular aspirations, it would seem that they would be ideal for appraising a school's quality. Unfortunately, that's not the way it works out.

When a state's education officials decide to identify the skills and knowledge students should master, the typical procedure for doing so hinges on the recommendations of subject-matter specialists from that state. For example, if authorities in Ohio or New Mexico want to identify their state's official content standards for mathematics, then a group of, say, 30 math teachers, math-curriculum consultants, and university math professors are invited to form a statewide content-standards committee.

Typically, when these committees attempt to identify the skills and knowledge students should master, their recommendation -- not surprisingly -- is that they should master everything. These committees seem bent on identifying skills they fervently wish students would possess. Regrettably, the resultant litanies of committee-chosen content standards tend to resemble curricular wish lists rather than realistic targets.

Whether or not the targets make sense, there tend to be a lot of them, and the effect is counterproductive. A state's standards-based tests are intended to evaluate schools based on students' test performances, but teachers soon become overwhelmed by too many targets. Educators must guess about which of this multitude of content standards will actually be assessed on a given year's test. Moreover, because there are so many content standards to be assessed and only limited testing time, it is impossible to report any meaningful results about which content standards have and haven't been mastered.

After working with standards-based tests aimed at so many targets, teachers understandably may devote less and less attention to those tests. As a consequence, students' performances on this type of instructionally insensitive test often become dependent on the very same SES factors that compromise the utility of nationally standardized achievement tests when used for school evaluation.

Wrong Tests, Wrong Consequences

Bad things happen when schools are evaluated using either of these two types of instructionally insensitive tests. This is particularly true when the importance of a school evaluation is substantial, as it is now. All of the nation's public schools are evaluated annually under the provisions of the federal No Child Left Behind Act.

Not only are the results of the NCLB school-by-school evaluations widely disseminated, there are also penalties for schools that receive NCLB funds yet fail to make sufficient test-based progress. These schools are placed on an improvement track that can soon "improve" them into nonexistence. Educators in America's public schools obviously are under tremendous pressure to improve their students' scores on whatever NCLB tests their state has chosen.

With few exceptions, however, the assessments states have chosen to implement because of NCLB are either nationally standardized achievement tests or state-developed standards-based tests -- both of which are flawed. Here, then, are three adverse classroom consequences seen in states where instructionally insensitive NCLB tests are used:

Curricular Reductionism

In an effort to boost their students' NCLB test scores, many teachers jettison curricular content that -- albeit important -- is not apt to be covered on an upcoming test. As a result, students end up educationally shortchanged.

Excessive Drilling

Because it is essentially impossible to raise students' scores on instructionally insensitive tests, many teachers -- in desperation -- require seemingly endless practice with items similar to those on an approaching accountability test. This dreary drilling often stamps out any genuine joy students might (and should) experience while they learn.

Modeled Dishonesty

Some teachers, frustrated by being asked to raise scores on tests deliberately designed to preclude such score raising, may be tempted to adopt unethical practices during the administration or scoring of accountability tests. Students learn that whenever the stakes are high enough, the teacher thinks it's OK to cheat. This is a lesson that should never be taught.

These three negative consequences of using instructionally insensitive standardized tests as measuring tools, taken together, make it clear that today's widespread method of judging schools does more than lead to invalid evaluations. Beyond that, such tests can dramatically lower the quality of education.

An Antidote

Is it possible to build accountability tests that both supply accurate evidence of school quality and promote instructional improvement? The answer is an emphatic yes. In 2001, prior to the enactment of NCLB, an independent national study group, the Commission on Instructionally Supportive Assessment, identified three attributes an "instructionally supportive" accountability test must possess:

A Modest Number of Supersignificant Curricular Aims

To avoid overwhelming teachers and students with daunting lists of curricular targets, an instructionally supportive accountability test should measure students' mastery of only an intellectually manageable number of curricular aims, more like a half-dozen than the 50 or so a teacher may encounter today. However, because fewer curricular benchmarks are to be measured, they must be truly significant.

Lucid Descriptions of Aims

An instructionally helpful test must be accompanied by clear, concise, and teacher-palatable descriptions of each curricular aim to be assessed. With clear descriptions, teachers can direct their instruction toward promoting students' mastery of skills and knowledge rather than toward getting students to come up with correct answers to particular test items.

Instructionally Uuseful Reports

Because an accountability test that supports teaching is focused on only a very limited number of challenging curricular aims, a student's mastery of each subject can be meaningfully measured, letting teachers determine how effective their instruction has been. Students and their parents can also benefit from such informative reports.

These three features can produce an instructionally supportive accountability test that will accurately evaluate schools and improve instruction. The challenge before us, clearly, is how to replace today's instructionally insensitive accountability tests with better ones. Fortunately, at least one state, Wyoming, is now creating its own instructionally supportive NCLB tests. More states should do so.

What You Can Do

If you want to be part of the solution to this situation, it's imperative to learn all you can about educational testing. Then learn some more. For all its importance, educational testing really isn't particularly complicated, because its fundamentals consist of commonsense ideas, not numerical obscurities.

You'll not only understand better what's going on in the current mismeasurement of school quality, you'll also be able to explain it to others. And those others, ideally, will be school board members, legislators, and concerned citizens who might, in turn, make a difference. Simply hop on the Internet or head to your local library and hunt down an introductory book or two about educational assessment. (I've written several such books that, though not as engaging as a crackling good spy thriller, really aren't intimidating.)

With a better understanding of why it is so inane -- and destructive -- to evaluate schools using students' scores on the wrong species of standardized tests, you can persuade anyone who'll listen that policy makers need to make better choices. Our 40-year saga of unsound school evaluation needs to end. Now.

W. James Popham, who began his career in education as a high school teacher in Oregon, is professor emeritus at the University of California at Los Angeles's School of Education and Information Studies. He is the author of 25 books and a former president of the American Educational Research Association.

Comments (42)Sign in or register to postSubscribe to comments via RSS

Karl's picture
Anonymous (not verified)

Precision and accuracy are part of middle school science curriculum. Yet, they are not part of the discussion about standardized testing. How can we expect our 7th graders to understand precision and accuracy, when we evaluate them using tests that are neither precise nor accurate, but we report the results as if they were both accurate and precise?

May I reference a general discussion of testing reliability: http://www.geocities.com/gordonite32/misc/testsupport.htm and a supporting example of a test that shows very low precision: http://www.geocities.com/gordonite32/misc/MAPtest.htm Testing should guide teachers. Instead most tests limit both teachers and students.

julie's picture
Anonymous (not verified)

im a student in a private high school in ny and my school is having a debate on standardized testing; whether it is helpful or harmful to the students that it is given to. i have been taking these tests since im in second grade and i can tell you that these tests do not measure a students intelligence and they certainly do not help the students for life. i have a friend who has trouble taking tests but if you verbally test her on the material before or after the actual test she knows it perfectly. altho i understand that there has to be an overall grading system, a way for the government to know how well the educationaly system is working, i dont beleive that standardized tests are the best solution.

Lee Lee's picture
Anonymous (not verified)

The problem with standardized testing is that the questions are counter-intuitive. The questions are often phrased in a negative positive or positive negative way, so for people who are logical, they can figure out the puzzle in the question, but for a person like me I try to answer the question with a literal answer. I never liked tests and I had to study very hard for exams in order to become an early childhood teacher.

Even at the second grade level the standardized tests were tricky! In May, as an student teacher, I was helping two children understand probability problems presented on the computer to practice for their upcoming standardized tests. The screen showed a picture of five red marbles and one white marble. (The students are from a lower socio-economic class and both children were average students in reading and math). The questions were similar to this one: "If you select a marble without looking, how likely is it that you pick a white one?" Choices for the answers were Certain, Probable, Unlikely, Impossible.

First, I asked the student if she understood the vocabulary and had them tell me what they meant in their own words. They both knew the meanings for probable and impossible, but, they could not connect the answers to the questions for the more difficult problems with more than three colored marbles. I had to explain what each of the answers meant and I offered an analogy stating that "if I had a bag containing five red marbles and one white marbles, how likely would it be that you would pick the red marble?" I would also ask the question with "...one white marble?" to demonstrate that there were two ways in looking at the problem. Would it be certain, definitely; probable, probably; unlikely, probably not; or impossible, never? Note that I understand that probably and probably not might have been confusing for the students, but I did not know a good synonym for unlikely. After I offered the analogy and more colloquial instruction, the children were able to understand the questions. I think test writers need to write questions in a simple and direct way to access children in a more accurate manner. Also, it's silly for the test writers to suggest that young childre look away because children at that age might take the question literally and then they would not be able to see the question or answer. It would be better for the test writers to present the problem in a real and concrete way like I did and the way most math textbooks do.

Lee Lee's picture
Anonymous (not verified)

Corrections: pronoun and verb disagreements in "First, I asked the student if she understood the vocabulary and had them tell me what they meant in their own words." "Student" should be changed to the plural form and "she" to "they".

I think test writers need to write questions in a simple and direct way to access children in a more accurate manner. "Access" should be changed to "assess"---I guess my brain is tired after 1 a.m.

Larry's picture
Anonymous (not verified)

Having taught for 40 years, I have to agree with practically everything in this article. However, I find the arrogance of educators in the United States to be not only appalling but also counter-productive. I have often heard of the "best-practices" approach to education, but rarely do you find any of our foreign "competitors" included in the best-practices evaluations. Perhaps, it is time for us to truly evaluate why we are lagging behind other countries in education. Perhaps they have some methods and approaches that are superior to what is happening in the United States. There is no doubt that the amount of time and resources that are allocated to inter-scholastic sports is a detriment to the educational system. Although this is a very unpopular statement, it has to be considered when you realize that many foreign schools do not support or even engage in inter-scholastic sports. In countries such as the Netherlands, you will not find such sports activities in the schools, yet they produce world-class soccer teams consistently. And no matter how you look at it, you will not convince me that any of the questionable drug-enhanced "role models" found in sports will have the positive influence on youth as one good teacher. Our priorities in United States education are wrong. A great change is needed and it will not come from federal or state governments, nor will it come from parents who have no understanding the needs or difficulties found in education. It will come from the mass of caring, result-driven educators who have so long been ignored in the educational process. It's amazing that in the United States, the one profession that has very little to say about their own profession are teachers. I hope I live long enough to see that changed in a significant way.

Steve's picture
Anonymous (not verified)

I agree with Larry, (the 40-year veteran teacher), about the analysis of our educational system based on our own evaluation. Any judge can put a blue ribbon on his own entry. I recently had the privilege of visiting several schools in Beijing and Bouding China. The schools are designed to support the students who work and who are capable of achieving. They do not cater to the students who are only capable of performing on a Friday night. The Chinese run a very harsh system with the end result being average or below average students relegated to a life of manual labor. However, the alternative is what we seem to have in the U.S. We are not supporting our best and brightest with gifted programs and the product is mediocrity. As Larry suggested, maybe it's time to support academics like we have supported athletics.

Stacy Gutner's picture

That's it right there, you said "proper test taking skills"! I want my child to learn, not to learn how to take a test!!!
We are having that exact problem here in Palm Beach County Florida. This year they implemented a program at a cost of 28 million dollars and they are teaching to the test!! This is so wrong.
Everyone is different and learns differently and they didn't take that into account. My daughter is an ESE student who is learning challenged and will never pass a standardized test, she just won't. She is very bright but this will hold her back instead of her moving forward.

Peter Medveczky's picture

Well, look at real scientific evidence. Read Science 330:335 (October 15, 2010 issue) by MA Pyc and KA Rawson. They present clear evidence that testing is an important part of the learning process. Some of the comments I read hear are pure political correctness. Eliminating testing but do not serve the obvious goal of education; learning!

Peter Medveczky, Prof Univ. S. Florida

Luis Garcia's picture
Luis Garcia
Strategy Consultant for Information Technology Sector and Government

Excellent view! This might sound odd or off, but if I could only ask or suggest to authors to please write shorter articles? Remember how much information is fighting for our attention everywhere, and we have so much people to get up-to speed. I say this because I really care! Thank you!

vjgee's picture
vjgee
Business Education teacher, 8 to 12s

"Arrogance of educators." Ouch. Are (we) "lagging behind other countries," where, what I understand, is that countries just don't bother with young-for-their-age students, if "they don't want to work" then they become former students, with little formal education, and what I mean to say is that some systems might seem better when what you are doing is comparing apples to oranges--a system can call all their "gifted" their only student population, tell me if they can't, without having ANY "methods and approaches that are superior" (unless you count 'discounting people' a superior method).
"Inter-scholastic sports is a detriment to the educational system," I read something that said hiring-staff will notice sports on a CV, as an indication of a thoughtful, focused, goal-oriented, socially-aware person. "Foreign schools do not support ... sports" while, apples and oranges again, Malaysia might not have the financial resources to provide much in the way of sports, Australia and the UK play rugby, cricket, just not grid-iron, while what I mean to say is I don't think sports is a waste in education. The Netherlands does well in soccer "though you will not find such sports activities in the schools," I'll take your word for it, but when do they play it then? they go to school until 6 in the evening, or is it sport from 4 until 6? and not school? Apples and oranges.
"A great change is needed," gawd-luv-ya but does it have to all come from educators so long "ignored"? In my estimation my students love school and I can't see how chucking the whole thing for change will make them love it any more. If I could spend more time with the 24 regulars and less with the 6 special needs students in my room, there might be more people better off--but while "public school is that last meeting place of the community" we'll be fine, just don't push that envelope any further. Meantime a UBC physics teacher suggested that publically educated students seem to fair better at university simply because they were not tutored and prodded. They have learned that so much was Tuum Est, up to you, during school, that they didn't look around for help, "just open the door and I'll get it myself."
The profession is amazing. It's difficult on the front-lines, but it's not something that needs fixing, whiles what is on offer is arm-chair advice. We'll do the job with support teachers, ESL specialists, salt-of-the-earth coaches, committed parents and administration, and kids that are keen and good or even otherwise. Just make sure we have got what we need and get out of the way.

blog Teacher Leadership and the Common Core State Standards?

Last comment 1 hour 24 min ago in Common Core

blog Games in Education: Teacher Takeaways

Last comment 3 weeks 1 day ago in Game-Based Learning

blog Focus on the Process and Results Will Follow

Last comment 3 weeks 4 days ago in School Leadership

Discussion Intelligent people are happier

Last comment 1 month 2 days ago in Assessment

Discussion Why Your Kid's Grades Wont Matter: Part Two

Last comment 1 month 6 days ago in Assessment

Sign in and Join the Discussion! Not a member? Register to join the discussion.