Community-Based Assessment Makes the Grade

In scoring student performance, top marks go to teachers rather than the testing industry.

In scoring student performance, top marks go to teachers rather than the testing industry.
Illustration of conveyor belt and hands stamping
Credit: Jason Lee

We're fast approaching a point in this country when the promotion or graduation of students will result not from their classroom work or the opinions of the educators who spend each day with them but from their performance on a single standardized test. Because I've spent the last 15 years inside the testing industry -- working for many of the biggest companies on many of the biggest tests -- this trend doesn't seem so smart to me.

In fact, I'd say linking federal education funds to regional standardized test scores (as No Child Left Behind does) or teacher pay to student test results (the probable, but unintended, outcome of President Obama's Race to the Top program) are ideas that should be reconsidered.

My complaint with large-scale assessment does not lie with the multiple-choice tests, because those are scored electronically. The real trouble begins in the realm of open-ended tests, where students answer questions in their own words and are assessed by fallible human beings. The testing industry wants those subjective student responses to be scored as consistently as multiple-choice tests.

To do this, the industry establishes hard-and-fast rules for its short-term "professional scorers" to adhere to. In my experience, these rules -- written for recently hired temporary employees -- ultimately turn the process into a theater of the absurd. I know, because I've sat through the training sessions.

Working on a national assessment test in 2005, I helped establish scoring rules for a test question that asked students performing a hands-on science task to describe what happened when they mixed a liquid and a solid. The rubric, written by classroom teachers, said full credit should be awarded to answers showing "complete understanding."

But everyone had a different idea of "complete understanding." So the test company tried to specify exactly what that meant. I sat in on a lengthy conference call filled with test developers and science teachers as we tried to hammer out the right and wrong student responses, and I was amazed as those earnest educators considered potential responses.

"If we accept 'The liquid bubbled,'" one scientist said, "then I don't see how we can't accept 'It sizzled.'"

"But sizzled isn't the same as bubbled," another argued, and soon everyone on the phone was debating whether boiled meant the same as sizzled, fizzled the same as sizzled, fizzed the same as fizzled.

When people ask how I would reform standardized testing, I point to models that work on a smaller scale. In the current system, temporary employees must adhere to unyielding rules established to deal with tens of thousands of student responses. A reformed system would have a smaller number of scorers assessing the work of a smaller number of students. This means placing assessment back in the hands of the teacher who can make thoughtful decisions about the students he or she knows.

If small-scale assessment sounds like an expensive solution that won't fly in today's economy, consider Washington State's recent achievements. In response to a 2004 ballot initiative, the state rolled out a comprehensive classroom-based-assessment program for social studies, health, and the arts. These CBAs are written and administered on the state level, but student results are assessed by classroom teachers. This makes for a win-win: Administrators and policy makers receive standardized results across the state, and students are spared the obvious downfalls of large-scale test scoring.

Organizations like Boston-based FairTest consider programs such as Washington's to be authentic assessments. This is because the CBAs are based on student performances or portfolios they produce over a period of time. In this scenario, assessment no longer rests on the open-ended answers that students recall on one stressful day.

It is increasingly important to change the testing industry. Race to the Top is based on national assessment criteria, and that is set to become the new gatekeeper for federal education funding. Absent reform, we are placing life-changing assessments about students in the hands of bored temps who give fleeting glances to students' work.

Todd Farley is the author of Making the Grades: My Misadventures in the Standardized Testing Industry.

This article originally published on 12/23/2009

see more see less

Comments (3)

Comment RSS

First of all I would like to

Was this helpful?
0

First of all I would like to say great blog! I had a quick question which I'd like to ask if you do not mind. I was interested to know how you center yourself and clear your thoughts before writing. I have had difficulty clearing my thoughts in getting my ideas out there. I do take pleasure in writing but it just seems like the first 10 to 15 minutes are generally lost just trying to figure out how to begin. Any ideas or hints? Thank you! building a rabbit fence

High School Teacher from Sydney, Australia

Food for Thought

Was this helpful?
0

Depressing reading, Todd - especially as Australia we seem to be determined to head down the path to large scale standardized testing, with only the unions to prevent it happening.

Principal

Making the grade

Was this helpful?
0

Farley's assessment of assessment is spot on. Race to the Top is yet another attempt to quantify an education - we can establish standards and improve pedagogy but we cannot quantify an education. That educators and psychometricians cannot agree on descriptors from boiled to fizzled reflects the problem, and danger, in trying to quantify (specifically, determine Pass/Fail) an education. If children, along with trained scientists, educators and test-makers, interpret results of an experiment to the best of their ability we need to accept both their (the children) effort and interpretation. Washington State's assessment seems to be closer to what can be called authentic because whether or not a child passes or fails a state test should be less important than she grasp a particular concept.

see more see less