Standardized Testing Fails the Exam | Edutopia
Edutopia on Facebook
Edutopia on Twitter
Edutopia on Google+
Edutopia on Pinterest Follow Me on Pinterest
WHAT WORKS IN EDUCATION The George Lucas Educational Foundation

Standardized Testing Fails the Exam

If enough educators -- and noneducators -- realize there are serious flaws in how we evaluate our schools, maybe we can stop this absurdity.
By W. James Popham
  • Facebook
  • Twitter
  • Pinterest
  • Share
Credit: Veer/James Godman

For the last four decades, students' scores on standardized tests have increasingly been regarded as the most meaningful evidence for evaluating U.S. schools. Most Americans, indeed, believe that students' standardized test performances are the only legitimate indicator of a school's instructional effectiveness.

Yet, although test-based evaluations of schools seem to occur almost as often as fire drills, in most instances these evaluations are inaccurate. That's because the standardized tests employed are flat-out wrong.

Standardized tests have been used to evaluate America's schools since 1965, when the U.S. Elementary and Secondary Education Act became law. That statute provided for the first major infusion of federal funds into local schools and required educators to produce test-based evidence that ESEA dollars were well spent.

But how, you might ask, could a practice that's been so prevalent for so long be mistaken? Just think back to the many years we forced airline attendants and nonsmoking passengers to suck in secondhand toxins because smoking on airliners was prohibited only during takeoff and landing.

Some screw-ups can linger for a long time. But mistakes, even ones we've lived with for decades, can often be corrected once they've been identified, and that's what we must do to halt today's wrongheaded school evaluations. If enough educators -- and noneducators -- realize that there are serious flaws in the way we evaluate our schools, and that those flaws erode educational quality, there's a chance we can stop this absurdity.

Instructionally Insensitive

First, some definitions:

A standardized test is any test that's administered, scored, and interpreted in a standard, predetermined manner. Standardized aptitude tests are designed to make predictions about how a test taker will perform in a subsequent setting. For example, the SAT and the ACT are used to predict the grades that high school students will earn when they get to college. By contrast, standardized achievement tests indicate how well a test taker has acquired knowledge and mastered certain skills.

Although students' scores on standardized aptitude tests are sometimes unwisely stirred into the school-evaluation stew, scores on standardized achievement tests are typically the ones used to judge a school's success. Two kinds of standardized achievement tests commonly used for school evaluations are ill suited for that measurement.

The first of these categories are nationally standardized achievement tests like the Iowa Tests of Basic Skills, which employ a comparative measurement strategy. The fundamental purpose of all such tests is to compare a student's score with the scores earned by a previous group of test takers (known as the norm group). It can then be determined if Johnny scored at the 95th percentile on a given test (attaboy!) or at the tenth percentile (son, we have a problem).

Because of the need for nationally standardized achievement tests to provide fine-grained, percentile-by-percentile comparisons, it is imperative that these tests produce a considerable degree of score spread -- in other words, plenty of differences among test takers' scores. So, producing score spread often preoccupies those who construct standardized achievement tests.

Statistically, a question that creates the most score spread on standardized achievement tests is one that only about half the students answer correctly. Over the years, developers of standardized achievement tests have learned that if they can link students' success on a question to students' socioeconomic status (SES), then about half of the test takers usually answer that item correctly. If an item is answered correctly more often by students at the upper end of the socioeconomic scale than by lower-SES kids, that question will provide plenty of score spread.

After all, SES is a delightfully spread-out variable and one that isn't quickly altered. As a result, in today's nationally standardized achievement tests, there are many SES-linked items.

Unfortunately, this kind of test tends to measure not what students have been taught in school but what they bring to school. That's the reason there's such a strong relationship between a school's standardized-test scores and the economic and social makeup of that school's student body.

As a consequence, most nationally standardized achievement tests end up being instructionally insensitive. That is, they're unable to detect improved instruction in a school even when it has definitely taken place. Because of this insensitivity, when students' scores on such tests are used to evaluate a school's instructional performance, that evaluation usually misses the mark.

A second kind of instructionally insensitive test is the sort of standardized achievement test that many states have developed for accountability during the past two decades. Such tests have typically been created to better assess students' mastery of the officially approved skills and knowledge. Those skills and knowledge, sometimes referred to as goals or curricular aims, are usually known these days as content standards. Thus, such state-developed standardized assessments -- like the Florida Comprehensive Assessment Test (FCAT) -- are frequently described as standards-based tests.

Because these customized standards-based tests were designed (almost always with the assistance of an external test-development contractor) to be aligned with a state's curricular aspirations, it would seem that they would be ideal for appraising a school's quality. Unfortunately, that's not the way it works out.

When a state's education officials decide to identify the skills and knowledge students should master, the typical procedure for doing so hinges on the recommendations of subject-matter specialists from that state. For example, if authorities in Ohio or New Mexico want to identify their state's official content standards for mathematics, then a group of, say, 30 math teachers, math-curriculum consultants, and university math professors are invited to form a statewide content-standards committee.

Typically, when these committees attempt to identify the skills and knowledge students should master, their recommendation -- not surprisingly -- is that they should master everything. These committees seem bent on identifying skills they fervently wish students would possess. Regrettably, the resultant litanies of committee-chosen content standards tend to resemble curricular wish lists rather than realistic targets.

Whether or not the targets make sense, there tend to be a lot of them, and the effect is counterproductive. A state's standards-based tests are intended to evaluate schools based on students' test performances, but teachers soon become overwhelmed by too many targets. Educators must guess about which of this multitude of content standards will actually be assessed on a given year's test. Moreover, because there are so many content standards to be assessed and only limited testing time, it is impossible to report any meaningful results about which content standards have and haven't been mastered.

After working with standards-based tests aimed at so many targets, teachers understandably may devote less and less attention to those tests. As a consequence, students' performances on this type of instructionally insensitive test often become dependent on the very same SES factors that compromise the utility of nationally standardized achievement tests when used for school evaluation.

Wrong Tests, Wrong Consequences

Bad things happen when schools are evaluated using either of these two types of instructionally insensitive tests. This is particularly true when the importance of a school evaluation is substantial, as it is now. All of the nation's public schools are evaluated annually under the provisions of the federal No Child Left Behind Act.

Not only are the results of the NCLB school-by-school evaluations widely disseminated, there are also penalties for schools that receive NCLB funds yet fail to make sufficient test-based progress. These schools are placed on an improvement track that can soon "improve" them into nonexistence. Educators in America's public schools obviously are under tremendous pressure to improve their students' scores on whatever NCLB tests their state has chosen.

With few exceptions, however, the assessments states have chosen to implement because of NCLB are either nationally standardized achievement tests or state-developed standards-based tests -- both of which are flawed. Here, then, are three adverse classroom consequences seen in states where instructionally insensitive NCLB tests are used:

Curricular Reductionism

In an effort to boost their students' NCLB test scores, many teachers jettison curricular content that -- albeit important -- is not apt to be covered on an upcoming test. As a result, students end up educationally shortchanged.

Excessive Drilling

Because it is essentially impossible to raise students' scores on instructionally insensitive tests, many teachers -- in desperation -- require seemingly endless practice with items similar to those on an approaching accountability test. This dreary drilling often stamps out any genuine joy students might (and should) experience while they learn.

Modeled Dishonesty

Some teachers, frustrated by being asked to raise scores on tests deliberately designed to preclude such score raising, may be tempted to adopt unethical practices during the administration or scoring of accountability tests. Students learn that whenever the stakes are high enough, the teacher thinks it's OK to cheat. This is a lesson that should never be taught.

These three negative consequences of using instructionally insensitive standardized tests as measuring tools, taken together, make it clear that today's widespread method of judging schools does more than lead to invalid evaluations. Beyond that, such tests can dramatically lower the quality of education.

An Antidote

Is it possible to build accountability tests that both supply accurate evidence of school quality and promote instructional improvement? The answer is an emphatic yes. In 2001, prior to the enactment of NCLB, an independent national study group, the Commission on Instructionally Supportive Assessment, identified three attributes an "instructionally supportive" accountability test must possess:

A Modest Number of Supersignificant Curricular Aims

To avoid overwhelming teachers and students with daunting lists of curricular targets, an instructionally supportive accountability test should measure students' mastery of only an intellectually manageable number of curricular aims, more like a half-dozen than the 50 or so a teacher may encounter today. However, because fewer curricular benchmarks are to be measured, they must be truly significant.

Lucid Descriptions of Aims

An instructionally helpful test must be accompanied by clear, concise, and teacher-palatable descriptions of each curricular aim to be assessed. With clear descriptions, teachers can direct their instruction toward promoting students' mastery of skills and knowledge rather than toward getting students to come up with correct answers to particular test items.

Instructionally Uuseful Reports

Because an accountability test that supports teaching is focused on only a very limited number of challenging curricular aims, a student's mastery of each subject can be meaningfully measured, letting teachers determine how effective their instruction has been. Students and their parents can also benefit from such informative reports.

These three features can produce an instructionally supportive accountability test that will accurately evaluate schools and improve instruction. The challenge before us, clearly, is how to replace today's instructionally insensitive accountability tests with better ones. Fortunately, at least one state, Wyoming, is now creating its own instructionally supportive NCLB tests. More states should do so.

What You Can Do

If you want to be part of the solution to this situation, it's imperative to learn all you can about educational testing. Then learn some more. For all its importance, educational testing really isn't particularly complicated, because its fundamentals consist of commonsense ideas, not numerical obscurities.

You'll not only understand better what's going on in the current mismeasurement of school quality, you'll also be able to explain it to others. And those others, ideally, will be school board members, legislators, and concerned citizens who might, in turn, make a difference. Simply hop on the Internet or head to your local library and hunt down an introductory book or two about educational assessment. (I've written several such books that, though not as engaging as a crackling good spy thriller, really aren't intimidating.)

With a better understanding of why it is so inane -- and destructive -- to evaluate schools using students' scores on the wrong species of standardized tests, you can persuade anyone who'll listen that policy makers need to make better choices. Our 40-year saga of unsound school evaluation needs to end. Now.

W. James Popham, who began his career in education as a high school teacher in Oregon, is professor emeritus at the University of California at Los Angeles's School of Education and Information Studies. He is the author of 25 books and a former president of the American Educational Research Association.

Comments (43)Sign in or register to postSubscribe to comments via RSS

Anonymous's picture
Anonymous (not verified)

Standardized testing and achievement tests hold an extremely negative place in my heart. I teach 5th grade in Ohio. For the past two years our school did not meet AYP, therefore we were put on "school improvement". We had to send a letter home to the parents of our school stating we did not make the progress intended by the state and giving them the option of changing thier child's school. It made the parents of our students question our effectiveness as teachers. If we did not send the letter we would lose our Title 1 funding, therefore making it even more difficult for struggling students to get the help they need. We went through a tough road last year, endless meetings, stress, and students who did not enjoy learning. We were able to pull ourselves out of school improvement. Now, we are under pressure to get our students to pass the tests again this year. We have no time for art or activities that will motivate and enhance student learning. My special needs students have an extremely hard time with testing. I feel I need to teach to the test.
The pressure on standardized testing needs to shift to see maximum student growth.

Brad's picture
Anonymous (not verified)

I have to disagree with you statement that the tests should only be given to white, middle class students with no disabilities, etc.

Students at my school are passing state tests and a majority of them are second language and some have disabilities. With proper test taking skills and knowledge of test item analysis, gender and/or class really does not matter. My school is high-achieving based on no child left behind standards and we are located in a Hispanic community where most of the students are from Mexico.

What is one bias you have noticed on a state test? If a student is in our country, they should be learning our culture as well as celebrating their own. If i'm in a foreign country, i'm expected to abide by their rules, adjust to their culture, and learn how to pass their tests. Why is America so different? Should we abandon our culture to accomodate everyone else?

Just a thought..


Michelle's picture
Anonymous (not verified)

Slowly students are losing the joy of learning. We are going to create a nation of children that hate school because it is all about the test.

Anonymous's picture
Anonymous (not verified)

Across the board teachers have stressed their concern about these assessments. Instead of them going away, every year I am forced to administer yet another one. I don't have time to teach what I was hired to teach. You would think they would listen to the teachers, especially since we are the ones closest to the students and correct me if I am wrong but aren't they what we are here for.

Julie's picture
Anonymous (not verified)

I, myself, am also beginning to despise the standardized testing system. My school is also under pressure to meet AYP. This is my second year at the school, teaching 7th and 8th Grade, language arts. My principal recently approached me about how the seventh graders performed this year, because their scores declined from last year. My frustration comes from the fact that I felt I was blamed for this, even though I didn't have the students last year, and that it is impossible to prepare them completely in the 3 - 4 weeks before the test in the beginning of the year. We also started a new curriculum this year, and I feel that I have to put that aside and just work on what they struggled with on the test. I also feel that our creativity is stifled by having to teach to the test.
Standardized testing can not be seen as the sole determining factor of student success, especially because of the fact that not all are strong test-takers.

Sinim's picture
Anonymous (not verified)

I am currently in my third year teaching 2nd grade in CA. I was never trained or prepared for standardized testing and all of the pressure that came along with it. I teach at a low poverty, English learning community school. These tests were not made for them whatsoever. These students in 2nd grade are only 7-8 year olds! Why should they be put in this stressful situation so early on in their schooling? I feel all we are doing is just teaching to the test. Oh yes, we do have other curriculum to teach other than Language Arts and Math, yet we have no time to get to Social Studies or Science because we're so busy teaching the kids early on how to be good "test takers". However, with wishful thinking, I do hope they make some modifications to NCLB.

Bill Murphy's picture
Anonymous (not verified)

The idea of standardized testing seems backwards from what teacher education focuses on. We are taught different learning styles and the several multiple intelligences. Lessons must be differentiated and reach each unique learner. Special education students, IEP's, and ESL's must be accounted for. Bloom and the like are stressed everyday. Creating a student that is also a person and who is prepared for the 'real world' is more and more our job. However, when it comes time to measure student achievement, we use a "Sit down and fill in the bubble" approach that benefits the logical thinking, linguistic, efficient, teacher-pleasers.

Stephanie's picture
Anonymous (not verified)

I also tech second grade in CA. There is so much pressure on these young children to do good on all of these standardized tests. It breaks my heart that these little ones are not even going to have a chance to experince subjects such as art, social studies,and science. They are only young once. When they get older and ask what they remember about school, they will probably say TESTING! There are many better and different approaches that can assess a child learning!

Steph's picture
Anonymous (not verified)

All we can really do is to keep our heads high for our students and keep fighting for them! This has got to change eventually.

Anonymous's picture
Anonymous (not verified)

we scould have testing its a good thing to have in schools so lets keep having it !!!! one reason is teaches will have a hard time putting kids in class so thats why im writing here so WE NEED TESTING!!!!!!!!!!

Sign in to comment. Not a member? Register.