Assessment

4 Crucial Steps for Calibrating Assessments

Proficiency-based grading requires consistency from one classroom to the next when it comes to scoring assessments.

July 5, 2024

Your content has been saved!

gorodenkoff / iStock

The educational landscape has seen a shift toward alternative grading, moving from the traditional 100-point scale to evidence-based grading. This approach emphasizes the proficiency of skills and content standards over averaging points to provide for a more accurate reflection of student knowledge and performance. When my district transitioned to evidence-based grading, teachers began a new practice: calibrating assessments. This was a game changer in promoting deeper collaboration among teachers and reflecting on our assessment methods.

Importance of Calibration

Calibration is the process of ensuring consistency and fairness in grading across different teachers who give the same assessment.

During a professional learning community (PLC) session, my team calibrated a physical science test on chemical equations. The test required students to use mathematical representations to support the claim that atoms, and therefore mass, are conserved during a chemical reaction. A student had mixed up the terms “products” and “reactants” in their claim, evidence, and reasoning assessment. Although the student’s reasoning was otherwise correct, this mix-up sparked a debate within our PLC of whether this student is a 2 (approaching) or a 3 (proficient).

Our success criteria for the assessment showed that the student’s claim and evidence were correct. Everything in the reasoning was correct but the term “mix-up.” I asked my team, “If the success criterion is accurate reasoning, and the only thing wrong is the labeling of terms but the concepts are still accurate, does this student not deserve to still get this success criterion?” After discussion, our team determined, “Proficiency, not perfection.” This quote changed our perspective on calibrating assessments.

Switching from multiple-choice to open-ended assessments mandated teachers to calibrate, a practice that increased our success in moving to a proficiency-based grading system. Without calibration, students might receive different grades for similar performance levels, depending on the teacher. Our calibration was enhanced by using a four-point scale with clear success criteria, creating higher inter-rater reliability. The higher the inter-rater reliability within a proficiency scale, the more consistent our grading became.

4 Steps for Effective Calibration

1. Create common proficiency scales with clear success criteria. We developed district-wide proficiency scales that were both skill-based and content-based. These scales provided a framework for grading, ensuring that all teachers had a common understanding of what proficiency was. Having clear success criteria made it easier for students to understand what they needed to know and do to achieve proficiency.

2. Develop common assessments. To be able to calibrate, PLCs must develop and use common assessments. These assessments were collaboratively developed by the PLC and aligned to the proficiency scales. By crafting common assessments, we ensured that all students were assessed on the same skill and content standards, regardless of who they had as a teacher. The collaborative process of pulling these assessments together gave our PLC group a sense of unity and shared responsibility, enhancing the overall quality of our evaluation methods.

3. Engage in regular calibration of assessments. We committed to calibrating our assessments to maintain consistency and fairness in grading. Calibration sessions included teachers coming together to discuss and grade student work samples, ensuring that each teacher applied the proficiency scales and success criteria consistently. Calibration helped to identify any discrepancies in grading and allowed us to make adjustments where necessary to ensure uniformity. Calibration not only improved our accuracy of grading assessments but also increased our collaborative culture as teachers.

4. Analyze student data. After each assessment, we began to analyze students’ data to identify trends, patterns, and areas of improvement. This included examining the performance of different groups of students and assessing the effectiveness of our instructional practices and assessments. This data-driven approach ensured that our teaching was dynamic and responsive to our student needs and continuously improving.

Challenges and Solutions

Calibrating can significantly increase the fairness and grading accuracy of student assessments, but it is not without its challenges. A critical example occurred when one of our team members was absent during a calibration session. The five teachers who calibrated the assessment had very similar distributions of scores, but the teacher who was gone graded significantly more harshly. This discrepancy was evident when we analyzed our student data.

After further review, we saw that the teacher was scoring some things as 2s that the rest of us were scoring as 3s, not thinking the student’s error justified the loss of a point. It was an “aha” moment for us, highlighting why calibration was so crucial in grading. Following this PLC, we agreed on the importance of calibrating and sharing the calibration notes with anyone who missed a PLC to make sure we were uniform and fair in our grading practices.

A common issue among PLC groups was the time required for collaboratively scoring and analyzing student data. When our school transitioned to an evidence-based grading model, the school-day schedule changed to a later start every day. Teachers met for PLCs twice a week, with the other three days dedicated to student interventions and time for student makeups, reteaching, or reassessing.

Despite the increased time in the morning, teachers still found it difficult to find time for frequent collaboration and calibration. It was easier to meet with our building-wide colleagues between periods, during off periods, or even during lunch, but getting together district-wide was nearly impossible. Recently, some of our science PLCs have been trying to calibrate district-wide using a collaborative whiteboard app. Teachers could upload different assessments, and teachers could calibrate virtually in an instant.

Calibration has strengthened our PLCs and increased our continuous improvement as a team. While team members may respectfully disagree at times, they grade with the group to ensure equity and fairness in scores among students in the same course. This process fosters a culture of learning by doing, requiring us to deeply reflect on our teaching methods, the assessments we create, and our instructional practices. Our PLCs must remain committed to ongoing improvement and the improvement of high-quality education for all of our students.