They’ve Improved The State’s School Grading System. Now It Tells Us Even Less Than It Used To.

David SafierNovember 18, 2019July 2, 2025

First came high stakes tests, the educational equivalent of trying to improve children’s physical fitness by measuring their body mass index, strength and stamina, then measuring them again next year. And the next year. And the year after that.

High stakes tests yield terabytes of data, but no measurable student improvement. All we learn from the time consuming, curriculum distorting exercise is, test scores correlate with family income. Actually, we don’t even learn that. We knew it already.

Then came A-F school grades issued by the state based on students’ scores on the high stakes tests. In their original form, they were just a different way of presenting schools’ test scores. The only added value was, they made intuitive sense to people who want a simple way of rating schools. We all know what letter grades on report cards mean, so the system was easy to understand. Schools with an “A” or “B” grade were likely to have mostly middle-to-high income students and high academic achievement. The “C,” “D” and “F” schools were likely to have lower income students and lower academic achievement.

Lots of people complained about the grades, with good reason. They echoed the class bias of test scores, but the grades made the results were even more judgmental. They lavished praise on schools with high income students — “You get an A! You get a B! — while they labeled schools with low income students anywhere from average to failing. No matter how talented the teachers and administrators at the schools teaching low income students were, no matter how hard they worked, it was nearly impossible for them to get the top grades schools with higher income students received as a matter of course.

People at the Department of Education heard the complaints, so they decided to try and make the grading system more nuanced. Educators, statisticians and computer techies set to work to create a weighting system which made the grades more equitable.

The changes were at least a partial success. The current state grades reflect more than the students’ family income. That’s a step in the right direction, isn’t it?

Well, maybe. But the changes create a new problem. If the new, improved grading system doesn’t tell us which schools have the highest test scores, what does it tell us?

To try and figure that out, let’s take a look at four elementary schools in TUSD that all earned a “B” this year: Ochoa; Holladay Magnet; Gale; and Sam Hughes.

From 2018 to 2019, Ochoa Elementary accomplished the almost-unheard-of feat of jumping three letter grades, from an “F” to a “B.” Holladay Magnet Elementary made a major leap as well, from a “D” to a “B” in 2019.

Meanwhile, Sam Hughes Elementary and Gale Elementary, which were “A” schools in 2018, dropped a grade to a “B” in 2019.

So, all four TUSD schools earned an identical “B” grade from the state in 2019. When most people see those grades, it would be reasonable for them to assume the schools are in the same ballpark when it comes to student achievement. But that assumption would be wrong.

Look at the passing rates in the Language Arts and Math portions of the 2019 AZMerit test at each school.

Ochoa Elementary. Language Arts: 22%. Math: 30%.
Holladay Magnet Elementary. Language Arts: 34%. Math: 37%.
Gale Elementary. Language Arts: 60%. Math: 55%.
Sam Hughes Elementary. Language Arts: 75%. Math: 65%.

In Language arts, there is a 53 percentage point spread in the schools’ passing rates, from 22 percent to 75 percent. In Math it’s a 35 point spread, from 30 percent to 65 percent.

So what do we learn about the academic achievement of students at the four schools from their state grades? Somewhere between little and nothing.

There is a good explanation for the four schools receiving “B” grades despite the wide difference in their students’ rate passing rates on the AZMerit. It’s because in the current system, the single most important factor in determining school grades is student growth. The amount students’ test scores rise or fall from year to year accounts for half of a school’s grade. The other half is divided between three categories: student proficiency, English Language Learners’ growth and proficiency, and a variety of “acceleration and readiness measures.”

So let’s look at the change in passing rates at the four schools. Ochoa, had an 8 percent increase in Language Arts and a 13 point increase in Math. Holladay had a 11 point increase in Language Arts and a 15 point increase in Math. Those unusually high increases in the students’ passing rates is the reason the schools jumped from an “F” and “D” in 2018 to a “B.”

The passing rates of both schools with an “A” in 2018 dropped in 2019. Sam Hughes students had the same passing rate in Language Arts, but they dropped 8 points in Math. Gale students gained 1 point in Language Arts, but they dropped 5 points in Math. That’s the primary reason each school fell a notch from an “A” to a “B.”

With the new scoring system, it’s possible for lower income schools to shine because of student growth on state tests, even if their post-growth passing rates are still low, and higher income schools with high test scores can drop a grade or two if their passing rates slip. In the vast majority of cases, state grades still correlate with parental income, meaning they retain their socioeconomic bias, but it’s no longer as certain as it once was.

It’s ironic that the state grading system took a step forward in terms of equity but took a step backward in terms of clarity, which defeats the purpose of the A-F grades. To “improve” the state grading system, the state had to render it meaningless.

That’s where things stand. We’re stuck with a high stakes testing regimen that drives teachers and students crazy, eats into quality school time (No time for recess, art or music, just drill and test, drill and test, drill and test), and distorts the Language Arts and Math curricula into test performance delivery systems. And we have a state grading system that’s more equitable than the raw test scores, but there’s no way of knowing what a school’s grade means without digging into the data.

Which leaves us with two options. We can try to fix the unfixable by tweaking the high stakes tests and the state grading system. Or we can stop the madness and admit our 17 year, No Child Left Behind experiment with evaluating students and schools by assigning them a number or a grade is a failure, then do the right thing and throw the whole goddam NCLB mess onto the trash heap of educational history.

This article appears in Nov 7-13, 2019.

10 replies on “They’ve Improved The State’s School Grading System. Now It Tells Us Even Less Than It Used To.”

Wise Up says:

November 18, 2019 at 3:43 pm

Saw the movie Ford VS Ferrari over the weekend. It was chilling how the execs at Ford were acting just like the Ukranian diplomats that are crying that somebody is not using the proper channels we have in place. And then we come to education which emulates many of the failings of the past., in much the same way.

As I felt the grocery store yesterday I saw a bumper sticker that said, ” I believe in Public Schools.”

But what do they believe in? Nobody can tell.
jhuppent says:

November 19, 2019 at 12:32 pm

“…took a step backward in terms of clarity….”

How in the world does your brain jump from your clear narrative to that illogical conclusion?

You make it perfectly clear, with great clarity, that the new letter grading depends much more on a schools ability to improve academic achievement and much less on how smart the kids are on the first day of school.

Your only descriptive flaw was not calculating proficiency rates if these differential gains were maintained for 13 years. Instead you express disappointment that the oak tree is only six ft tall after the first season and not ready to be turned into lumber.

Two more serious problem plaguie the entire enterprise. First, growth grades k thru 3 outweighs all the other grades combined and create even more the motivational mindset that determines all that folllws. A to f distracts the system from properly focusing on those lower grades, grades completely excluded from the calcs
jhuppent says:

November 21, 2019 at 7:55 pm

.
.
.
.

Growth Growth Letter Growth School
Rank %tile Grade Points
Rank
51 96% B 48 Ochoa
73 95% A 47 Davis
86 94% A 47 Soleng Tom
100 93% A 46 Carrillo
135 90% A 45 Fruchthendler
173 87% B 45 Manzo
194 85% B 44 John B Wright
199 85% A 44 Dunham
304 77% B 42 Tolson
317 76% B 42 Miles-Exploratory
339 75% B 42 W Arthur Sewel
362 73% B 41 Ida Flood
365 73% A 41 Annie Kellond
378 72% B 41 Howell Peter
402 70% C 41 Irene Erickson
406 70% B 41 Hudlow
410 69% B 41 Henry Hank Oyama
519 61% B 39 Maldonado Amelia
532 60% B 39 Van Buskirk
560 58% C 39 Cavett
563 58% B 39 Mansfeld Middle
646 52% B 38 John E White
685 49% B 37 Lineweaver
714 47% B 37 Laura N. Banks
731 45% B 37 Holladay
779 42% C 36 Marshall
793 41% C 36 Lynn Urquides
803 40% C 36 W V Whitmore
808 39% C 36 Roskruge
914 32% C 34 Bonillas
930 30% C 34 Borton
938 30% C 34 Myers-Ganoung
948 29% C 34 Hollinger
949 29% B 34 Gale
955 28% C 34 Anna Henry
960 28% C 34 McCorkle PK-8
964 28% B 34 Ford Elementary
991 26% C 33 Frances J Warren
1001 25% C 33 Drachman
1009 24% C 33 Pueblo Gardens
1015 24% C 33 Harold Steele
1018 24% B 33 Sam Hughes
1031 23% D 32 Dietz
1033 23% C 32 C E Rose
1047 22% C 32 Roberts Naylor
1072 20% C 32 Wheeler
1073 20% B 32 Collier
1079 19% B 31 Borman
1105 17% C 31 Secrist
1116 16% C 31 Vesey
1122 16% D 30 Magee
1148 14% B 30 Robins
1159 13% D 29 Valencia
1161 13% C 29 Cragin
1199 10% C 28 Miller
1209 9% C 28 Tully
1220 9% D 28 Utterback
1224 8% D 27 Doolen
1226 8% D 27 Alice Vail
1239 7% D 27 Raul Grijalva
1241 7% F 27 Anna Lawrence
1243 7% D 27 Morgan Maxwell
1244 7% D 27 Pistor
1245 7% C 27 Bloom
1270 5% D 26 Booth-Fickett
1281 4% D 25 Gridley
1289 3% D 24 Blenman
1294 3% D 24 Robison
1302 2% D 23 Davidson
1306 2% D 23 Mission View
1319 1% F 20 Safford
David Safier says:

November 22, 2019 at 3:29 pm

John, I’m sure you understand everything in your two comments. I have to admit, I don’t. Which tells me, a wonk who is deep into the state grade process, like yourself, can glean a great deal of information from the grades. But that’s not the purpose of the A-F system. It’s to give regular folks a way to see how schools are doing without having to understand all the data. Most people will see the B grades earned by the four schools I discussed and imagine the student achievement at the schools is more-or-less the same. Obviously that’s not true.

I have a related question. If Ochoa and Holladay slip a few percentage points in their passing rates next year, I expect they’ll both drop to a C. I’d say that’s likely for both schools — if not next year, then in the years following. A big jump like theirs is likely to be balanced out with at least a slight dip in later years. And yet, for all intents and purposes, their students will be at close to the same level of achievement. People will say, “Looks like those schools aren’t as good as they were the year before,” even though the change related to the quality of teaching and student achievement as measured on AZMerit hasn’t changed significantly.

Neither the high stakes tests nor the state grades have fulfilled their stated purpose, yet they cost the state a great deal of money, and they cost our teachers and students a great deal of time, effort and stress. Put the tests and state grades to a cost-benefit analysis, and both have costs far higher than their benefits.
Close TUSD says:

November 22, 2019 at 3:37 pm

Better yet find a private or charter school that has proven results and move your kids.
jhuppent says:

November 22, 2019 at 9:56 pm

I agree with you on this: the letter grading system does more damage than good. And, I am the guy who originally put it into state law. But, if I hadn’t, someone else would have.

They mix two calculations which can’t be mixed: growth and achievement.

People would be better off if these two components weren’t mixed, just each clearly stated.

The latest letter grade, as you clearly state, puts much more weight on growth. It does this not only by weighting growth more but by creating separate categories of points for growth of Special Education student, Title One students, English Language Learners, etc.

In doing so, it makes those students more important.

Just today, I was explaining to some administrative staff how important it was to provide an opportunity to a group of very challenging, incredibly challenging, special ed students. I used the letter grade system to make my point. This was a C rated school, so they were listening carefully.
jhuppent says:

November 23, 2019 at 9:18 am

Let me redo my headings on my chart.

Letter Grade ………………………Percentile of…………..Letter…………..Growth………………School
Academic Growth ………………all schools………………Grade…………..Points
Rank……………………………………(higher is………………………………………in letter
when compared …………………better)………………………………………….grade system
to 1,335
k-8 schools

51………………………………………….. 96%………………………… B………………48…………………..Ochoa
73………………………………………….. 95%………………………… A………………47…………………..Davis
86………………………………………….. 94%………………………… A………………47……………………Soleng
100…………………………………………93%…………………………. A………………46…………………Carrillo
135……………………………………….. 90%…………………………. A………………45…………………Fruchten
173……………………………………….. 87%…………………………. B………………45…………………….Manzo
194……………………………………….. 85%…………………………. B………………44…………….John B Wright
199……………………………………….. 85%…………………………. A………………44…………………….Dunham
….
1244…………………………………………7%…………………………..D……………..27……………………..Pistor
1245…………………………………………7%…………………………..C……………..27………………………Bloom
1270…………………………………………5%…………………………..D………………26…………….Booth-Fickett
1281…………………………………………4%…………………………. D………………25…………………….Gridley
1289…………………………………………3%…………………………..D………………24…………………..Blenman
1294…………………………………………3%…………………………..D………………24…………………….Robison
1302…………………………………………2%…………………………..D………………23………………….Davidson
1306…………………………………………2%…………………………..D………………23……………..Mission View
1319…………………………………………1%…………………………..F………………20……………………..Safford

Save this data to see how it holds for next year. When they calculate classroom growth data from year to year, 25% of the top quartile of teachers end up in the bottom quartile the next year. In other words, there is a lot of statistical spray and froth in growth data. It’s the small n size issue. A large number of students aren’t included: kindergarten, first, second and third grade students aren’t included and their scale score growth is about 55% of the entire k through 12 total. Another 20 to 30% of students aren’t included because they aren’t full year students.

Yet, growth data and rankings are the only meaning that you can extract from test score data. All other data is an illusion, which David mentions frequently.

So, in the end, test score accountability is a dead end, doing more damage than good.
David Safier says:

November 23, 2019 at 10:19 am

John, it’s a rare pleasure to find us agreeing both in our analysis of the grading system and our conclusion that it is flawed. I acknowledge you were trying to make the system more equitable, something I advocated for. I think the failures of the system which you acknowledge indicate that it cannot be “fixed.” State grades, in my opinion, are a bad idea. We seem to agree on that.

I would say the same thing about high stakes tests, and have frequently. They have caused more harm than good, and I cannot imagine a way to fix their flaws. We could learn as much from standardized tests without the “high stakes” component administered, say, once in elementary, middle school and high school, as a general way to assess how students are doing in their basic math and reading skills. It could be something like a statewide version of the NAEP test, which could be given to either a sampling of students or to every student in certain grades. I’m not sure I like that alternative, but it certainly beats the system we have now.

Thanks for the redo of your table. I understand it now. It will be interesting to see how those numbers change. I would bet good money that the growth at Ochoa and Holladay will stall or reverse over the next few years. I think both Gale and Sam Hughes will go back up to where they were unless their student populations have shifted to include more struggling students.
jhuppent says:

November 23, 2019 at 9:24 pm

In 1996, David Garcia (he was my analyst for several years while I was chairman of Senate Education), we set up a NAEP style sampling system for Arizona. But, instead of pulling a sample, accurate to plus or minus 3 percentile points (your comment about Black scores in 2019 noted) and measuring once every two years, the system would have measured twice a year and had a sample size large enough to be accurate to within one percentile point.

Unfortunately, the whole thing just mystified people. No one understood it.
jhuppent says:

December 3, 2019 at 2:46 pm

David,

I came back to look at this again. The typical school, meaning the median school in that list, in TUSD ranks at the 30th percentile in statewide gains. You can excuse test scores based on poverty, you can’t excuse gain scores. That’s your systems. Do they measure teacher job satisfaction? If they do, do they read the suggestions for changes carefully and make the changes? Are administrators out visiting classrooms every day? Are principals frequently in the parking lot in the morning talking to parents?

Are teachers coming to class with prepared lesson plans, ready to teach?

For a district with such resources, that raises major questions.

The danger is that the feedback loop just creates more pressure which most often results in lower quality, not an improvement.

But, if I were on that board, I would be reexamining every system.