Credit: Courtesy of pixabay

First came high stakes tests, the educational equivalent of trying to improve children’s physical fitness by measuring their body mass index, strength and stamina, then measuring them again next year. And the next year. And the year after that.

High stakes tests yield terabytes of data, but no measurable student improvement.ย All we learn from the time consuming, curriculum distorting exercise is, test scores correlate with family income. Actually, we don’t even learn that. We knew it already.

Then came A-F school grades issued by the state based on students’ scores on the high stakes tests. In their original form, they were just a different way of presenting schools’ test scores.ย The only added value was, they made intuitive sense to people who want a simple way of rating schools. We all know what letter grades on report cards mean, so the system was easy to understand. Schools with an “A” or “B” grade were likely to have mostly middle-to-high income students and high academic achievement. The “C,” “D” and “F” schools were likely to have lower income students and lower academic achievement.

Lots of people complained about the grades, with good reason. They echoed the class bias of test scores, but the grades made the results were even more judgmental.ย They lavished praise on schools with high income students โ€” “You get an A! You get a B! โ€”ย while they labeled schools with low income students anywhere from average to failing.ย No matter how talented the teachers and administrators at the schools teaching low income students were, no matter how hard they worked, it was nearly impossible for them to get the top grades schools with higher income students received as a matter of course.

People at the Department of Education heard the complaints, so they decided to try and make the grading system more nuanced. Educators, statisticians and computer techies set to work to create a weighting system which made the grades more equitable.

The changes were at least a partial success. The current state grades reflect more than the students’ family income. That’s a step in the right direction, isn’t it?

Well, maybe. But the changes create a new problem.ย If the new, improved grading system doesn’t tell us which schools have the highest test scores, what does it tell us?

To try and figure that out, let’s take a look at four elementary schools in TUSD that all earned a “B” this year: Ochoa; Holladay Magnet; Gale; and Sam Hughes.

From 2018 to 2019, Ochoa Elementary accomplished the almost-unheard-of feat of jumping three letter grades,ย from an “F” to a “B.” Holladay Magnet Elementary made a major leap as well, from aย “D” to a “B” in 2019.

Meanwhile, Sam Hughes Elementary and Gale Elementary, which were “A” schools in 2018, dropped a grade to a “B” in 2019.

So, all four TUSD schools earned an identical “B” grade from the state in 2019. When most people see those grades, it would be reasonable for them to assume the schools are in the same ballpark when it comes to student achievement. But that assumption would be wrong.

Look at the passing rates in the Language Arts and Math portions of the 2019 AZMerit test at each school.

Ochoa Elementary. Language Arts: 22%. Math: 30%.
Holladay Magnet Elementary. Language Arts: 34%. Math: 37%.
Gale Elementary. Language Arts: 60%. Math: 55%.
Sam Hughes Elementary. Language Arts: 75%. Math: 65%.

In Language arts, there is a 53 percentage point spread in the schools’ passing rates, fromย 22 percent to 75 percent.ย In Math it’s a 35 point spread,ย from 30 percent to 65 percent.

So what do we learn about the academic achievement of students at the four schools from their state grades?ย Somewhere between little and nothing.

There is a good explanation for the four schools receiving “B” grades despiteย the wide difference in their students’ rate passing rates on the AZMerit. It’s because in the current system, the single most important factor in determining school grades is student growth. The amount students’ test scores rise or fall from year to year accounts for half of a school’s grade.ย The other half is divided between three categories: student proficiency, English Language Learners’ growth and proficiency, and a variety of “acceleration and readiness measures.”

So let’s look at the change in passing rates at the four schools. Ochoa, had an 8 percent increase in Language Arts and a 13 point increase in Math. Holladay had a 11 point increase in Language Arts and a 15 point increase in Math. Those unusually high increases in the students’ passing rates is the reason the schools jumped from an “F” and “D” in 2018 to a “B.”

The passing rates of both schools with an “A” in 2018 dropped in 2019. Sam Hughes students had the same passing rate in Language Arts,ย but they dropped 8 points in Math. Gale students gained 1 point in Language Arts, but they dropped 5 points in Math. That’s the primary reason each school fell a notch from an “A” to a “B.”

Withย the new scoring system, it’s possible for lower income schools to shine because of student growth on state tests, even if their post-growth passing rates are still low, and higher income schools with high test scores can drop a grade or two if their passing rates slip. In the vast majority of cases, state grades still correlate with parental income, meaning they retain their socioeconomic bias, but it’s no longer as certain as it once was.

It’s ironic that the state grading system took a step forward in terms of equity but took a step backward in terms of clarity,ย which defeats the purpose of the A-F grades. To “improve” the state grading system, the state had to render it meaningless.

That’s where things stand. We’re stuck withย a high stakes testing regimen that drives teachers and students crazy, eats into quality school time (No time for recess, art or music, just drill and test, drill and test, drill and test), and distorts the Language Arts and Math curricula into test performance delivery systems. And we have a state grading system that’s more equitable than the raw test scores, but there’s no way of knowing what a school’s grade means without digging into the data.

Which leaves us with two options. We can try to fix the unfixable by tweaking the high stakes tests and the state grading system. Or we can stop the madnessย and admit our 17 year, No Child Left Behind experiment with evaluating students and schools by assigning them a number or a grade is a failure, then do the right thing and throw the whole goddam NCLB mess onto the trash heap of educational history.

10 replies on “They’ve Improved The State’s School Grading System. Now It Tells Us Even Less Than It Used To.”

  1. Saw the movie Ford VS Ferrari over the weekend. It was chilling how the execs at Ford were acting just like the Ukranian diplomats that are crying that somebody is not using the proper channels we have in place. And then we come to education which emulates many of the failings of the past., in much the same way.

    As I felt the grocery store yesterday I saw a bumper sticker that said, ” I believe in Public Schools.”

    But what do they believe in? Nobody can tell.

  2. โ€œ…took a step backward in terms of clarity….โ€

    How in the world does your brain jump from your clear narrative to that illogical conclusion?

    You make it perfectly clear, with great clarity, that the new letter grading depends much more on a schools ability to improve academic achievement and much less on how smart the kids are on the first day of school.

    Your only descriptive flaw was not calculating proficiency rates if these differential gains were maintained for 13 years. Instead you express disappointment that the oak tree is only six ft tall after the first season and not ready to be turned into lumber.

    Two more serious problem plaguie the entire enterprise. First, growth grades k thru 3 outweighs all the other grades combined and create even more the motivational mindset that determines all that folllws. A to f distracts the system from properly focusing on those lower grades, grades completely excluded from the calcs

  3. .
    .
    .
    .

    Growth Growth Letter Growth School
    Rank %tile Grade Points
    Rank
    51 96% B 48 Ochoa
    73 95% A 47 Davis
    86 94% A 47 Soleng Tom
    100 93% A 46 Carrillo
    135 90% A 45 Fruchthendler
    173 87% B 45 Manzo
    194 85% B 44 John B Wright
    199 85% A 44 Dunham
    304 77% B 42 Tolson
    317 76% B 42 Miles-Exploratory
    339 75% B 42 W Arthur Sewel
    362 73% B 41 Ida Flood
    365 73% A 41 Annie Kellond
    378 72% B 41 Howell Peter
    402 70% C 41 Irene Erickson
    406 70% B 41 Hudlow
    410 69% B 41 Henry Hank Oyama
    519 61% B 39 Maldonado Amelia
    532 60% B 39 Van Buskirk
    560 58% C 39 Cavett
    563 58% B 39 Mansfeld Middle
    646 52% B 38 John E White
    685 49% B 37 Lineweaver
    714 47% B 37 Laura N. Banks
    731 45% B 37 Holladay
    779 42% C 36 Marshall
    793 41% C 36 Lynn Urquides
    803 40% C 36 W V Whitmore
    808 39% C 36 Roskruge
    914 32% C 34 Bonillas
    930 30% C 34 Borton
    938 30% C 34 Myers-Ganoung
    948 29% C 34 Hollinger
    949 29% B 34 Gale
    955 28% C 34 Anna Henry
    960 28% C 34 McCorkle PK-8
    964 28% B 34 Ford Elementary
    991 26% C 33 Frances J Warren
    1001 25% C 33 Drachman
    1009 24% C 33 Pueblo Gardens
    1015 24% C 33 Harold Steele
    1018 24% B 33 Sam Hughes
    1031 23% D 32 Dietz
    1033 23% C 32 C E Rose
    1047 22% C 32 Roberts Naylor
    1072 20% C 32 Wheeler
    1073 20% B 32 Collier
    1079 19% B 31 Borman
    1105 17% C 31 Secrist
    1116 16% C 31 Vesey
    1122 16% D 30 Magee
    1148 14% B 30 Robins
    1159 13% D 29 Valencia
    1161 13% C 29 Cragin
    1199 10% C 28 Miller
    1209 9% C 28 Tully
    1220 9% D 28 Utterback
    1224 8% D 27 Doolen
    1226 8% D 27 Alice Vail
    1239 7% D 27 Raul Grijalva
    1241 7% F 27 Anna Lawrence
    1243 7% D 27 Morgan Maxwell
    1244 7% D 27 Pistor
    1245 7% C 27 Bloom
    1270 5% D 26 Booth-Fickett
    1281 4% D 25 Gridley
    1289 3% D 24 Blenman
    1294 3% D 24 Robison
    1302 2% D 23 Davidson
    1306 2% D 23 Mission View
    1319 1% F 20 Safford

  4. John, I’m sure you understand everything in your two comments. I have to admit, I don’t. Which tells me, a wonk who is deep into the state grade process, like yourself, can glean a great deal of information from the grades. But that’s not the purpose of the A-F system. It’s to give regular folks a way to see how schools are doing without having to understand all the data. Most people will see the B grades earned by the four schools I discussed and imagine the student achievement at the schools is more-or-less the same. Obviously that’s not true.

    I have a related question. If Ochoa and Holladay slip a few percentage points in their passing rates next year, I expect they’ll both drop to a C. I’d say that’s likely for both schools — if not next year, then in the years following. A big jump like theirs is likely to be balanced out with at least a slight dip in later years. And yet, for all intents and purposes, their students will be at close to the same level of achievement. People will say, “Looks like those schools aren’t as good as they were the year before,” even though the change related to the quality of teaching and student achievement as measured on AZMerit hasn’t changed significantly.

    Neither the high stakes tests nor the state grades have fulfilled their stated purpose, yet they cost the state a great deal of money, and they cost our teachers and students a great deal of time, effort and stress. Put the tests and state grades to a cost-benefit analysis, and both have costs far higher than their benefits.

  5. I agree with you on this: the letter grading system does more damage than good. And, I am the guy who originally put it into state law. But, if I hadn’t, someone else would have.

    They mix two calculations which can’t be mixed: growth and achievement.

    People would be better off if these two components weren’t mixed, just each clearly stated.

    The latest letter grade, as you clearly state, puts much more weight on growth. It does this not only by weighting growth more but by creating separate categories of points for growth of Special Education student, Title One students, English Language Learners, etc.

    In doing so, it makes those students more important.

    Just today, I was explaining to some administrative staff how important it was to provide an opportunity to a group of very challenging, incredibly challenging, special ed students. I used the letter grade system to make my point. This was a C rated school, so they were listening carefully.

  6. Let me redo my headings on my chart.

    Letter Grade ………………………Percentile of…………..Letter…………..Growth………………School
    Academic Growth ………………all schools………………Grade…………..Points
    Rank……………………………………(higher is………………………………………in letter
    when compared …………………better)………………………………………….grade system
    to 1,335
    k-8 schools

    51………………………………………….. 96%………………………… B………………48…………………..Ochoa
    73………………………………………….. 95%………………………… A………………47…………………..Davis
    86………………………………………….. 94%………………………… A………………47……………………Soleng
    100…………………………………………93%…………………………. A………………46…………………Carrillo
    135……………………………………….. 90%…………………………. A………………45…………………Fruchten
    173……………………………………….. 87%…………………………. B………………45…………………….Manzo
    194……………………………………….. 85%…………………………. B………………44…………….John B Wright
    199……………………………………….. 85%…………………………. A………………44…………………….Dunham
    ….
    1244…………………………………………7%…………………………..D……………..27……………………..Pistor
    1245…………………………………………7%…………………………..C……………..27………………………Bloom
    1270…………………………………………5%…………………………..D………………26…………….Booth-Fickett
    1281…………………………………………4%…………………………. D………………25…………………….Gridley
    1289…………………………………………3%…………………………..D………………24…………………..Blenman
    1294…………………………………………3%…………………………..D………………24…………………….Robison
    1302…………………………………………2%…………………………..D………………23………………….Davidson
    1306…………………………………………2%…………………………..D………………23……………..Mission View
    1319…………………………………………1%…………………………..F………………20……………………..Safford

    Save this data to see how it holds for next year. When they calculate classroom growth data from year to year, 25% of the top quartile of teachers end up in the bottom quartile the next year. In other words, there is a lot of statistical spray and froth in growth data. It’s the small n size issue. A large number of students aren’t included: kindergarten, first, second and third grade students aren’t included and their scale score growth is about 55% of the entire k through 12 total. Another 20 to 30% of students aren’t included because they aren’t full year students.

    Yet, growth data and rankings are the only meaning that you can extract from test score data. All other data is an illusion, which David mentions frequently.

    So, in the end, test score accountability is a dead end, doing more damage than good.

  7. John, it’s a rare pleasure to find us agreeing both in our analysis of the grading system and our conclusion that it is flawed. I acknowledge you were trying to make the system more equitable, something I advocated for. I think the failures of the system which you acknowledge indicate that it cannot be “fixed.” State grades, in my opinion, are a bad idea. We seem to agree on that.

    I would say the same thing about high stakes tests, and have frequently. They have caused more harm than good, and I cannot imagine a way to fix their flaws. We could learn as much from standardized tests without the “high stakes” component administered, say, once in elementary, middle school and high school, as a general way to assess how students are doing in their basic math and reading skills. It could be something like a statewide version of the NAEP test, which could be given to either a sampling of students or to every student in certain grades. I’m not sure I like that alternative, but it certainly beats the system we have now.

    Thanks for the redo of your table. I understand it now. It will be interesting to see how those numbers change. I would bet good money that the growth at Ochoa and Holladay will stall or reverse over the next few years. I think both Gale and Sam Hughes will go back up to where they were unless their student populations have shifted to include more struggling students.

  8. In 1996, David Garcia (he was my analyst for several years while I was chairman of Senate Education), we set up a NAEP style sampling system for Arizona. But, instead of pulling a sample, accurate to plus or minus 3 percentile points (your comment about Black scores in 2019 noted) and measuring once every two years, the system would have measured twice a year and had a sample size large enough to be accurate to within one percentile point.

    Unfortunately, the whole thing just mystified people. No one understood it.

  9. David,

    I came back to look at this again. The typical school, meaning the median school in that list, in TUSD ranks at the 30th percentile in statewide gains. You can excuse test scores based on poverty, you can’t excuse gain scores. That’s your systems. Do they measure teacher job satisfaction? If they do, do they read the suggestions for changes carefully and make the changes? Are administrators out visiting classrooms every day? Are principals frequently in the parking lot in the morning talking to parents?

    Are teachers coming to class with prepared lesson plans, ready to teach?

    For a district with such resources, that raises major questions.

    The danger is that the feedback loop just creates more pressure which most often results in lower quality, not an improvement.

    But, if I were on that board, I would be reexamining every system.

Comments are closed.