It’s something of an obsession with me, writing about AzMERIT scores. A new set of scores, a new use of the scores, a new news story about the scores, and there I am with another post or two or three. So here’s yet another post, a rambling discussion on why the tests, the way they’re reported on and the way they’re used drive me nuts.
Let me start by getting something out of the way. The tests in and of themselves aren’t bad. They give a reasonably accurate reflection of students’ abilities in reading, writing and math. During my last few years teaching in a district outside of Portland, Oregon, I had to give the Oregon version of the high stakes standardized tests to my sophomore English classes. I did a pretty good job of predicting what my students’ scores would be based on what I had learned about their reading and writing abilities during the eight months before the tests, which means the test scores generally reflected the students’ skill levels. There were a significant number of exceptions, where students got higher or lower scores than I thought they would, which tells me the tests aren’t always accurate on an individual level. But when you’re looking at large numbers of students, and assuming everything is on the level—no “helpful encouragement” from teachers during the tests, no erase-and-replace of students’ answers by staff after the students hand in their tests — their average scores tell you something about their skill levels relative to other groups of students.
Now, with that out of the way, the problems. The first is, the high stakes nature of the tests distorts the schools’ curriculum and, sometimes, the test results. Since teachers, schools and school districts are judged by their students’ scores, they’re compelled to do everything they can to get the best results possible. That means teaching to the test, which means spending inordinate amounts of time and energy giving students the narrow skills needed to fill in the right bubbles. The give and take of loosely directed discussions is a luxury only to be indulged in when time allows. Creative pursuits, long term projects, even time on the playground are secondary to the central focus of the classroom: preparing students for test day. Teachers become mechanical skill-and-drill sergeants, which is not what they thought they signed up for when they decided to join the teaching profession. Students are encouraged to become robotic, learning how to be successful at performing variations of one repetitive task — answering short questions by picking the right answer from a short list of possibilities. The classroom is a different place — I would say a worse place — thanks to high stakes tests. And, sad to say, all that sweat, toil and tedium generally only adds a few points to students’ scores and even less to students’ actual skill levels, and since pretty much everyone is doing it, it’s a wash. Every class, school and district’s ranking in the state stays pretty much the same as it would have been if no one paid any attention to the test until test day.
And sometimes, the pressure to raise test scores leads individual teachers, or whole schools and districts, to cheat. Some schools and districts have been caught at it. Teachers and administrators in Atlanta went to jail for changing answers on student tests year after year. Others do it but haven’t been caught. A series of articles in USA Today a few years back talked about a nationwide analysis of erasures on student tests and found that in many schools, including in Arizona, the number of wrong answers erased and replaced by right answers was as likely to be random as it was likely that the school be struck by lightning on test day. Though state departments of education rarely look deeply into suspicious scores, Arizona’s ADE found nine schools where the evidence is strong enough, it’s highly probable students’ test papers were altered. Most likely, those schools are the visible tip of a larger problem. And that’s just the most easily detectable form of cheating. There are lots of undetectable ways to boost scores without increasing the students’ skill levels.
Cheating can become addictive, and additive. If a teacher cheats one year, how does he/she go back to being honest the next year without having to explain the drop in scores? If third grade teachers cheat, fourth grade teachers look bad if their students score lower than they did back in the third grade—and so on, up the grades. Educators are basically an honest, moral, but not necessarily courageous lot. If you put their salaries and/or their jobs on the line, many of them are liable to do what it takes to push those scores up.
The other part of the problem happens outside the schools. It’s the way the scores are interpreted and used. The general public see high scores and think “good schools” and “good teachers.” They see low scores and think “bad schools,” “failing schools” filled with “failing teachers.” If the public doesn’t come to that conclusion by itself, the privatization/”education reform” crowd is quick to assure them that’s the way it is, because they want to disgrace and dismantle the public school system, and they want to cripple the unions that support teachers. Condemning “failing schools” is a two-fer for the anti-public education crowd. Actually, it’s more like a three-fer, or maybe a four-fer if there’s such a thing, because they can use the “failing school” meme to push charters and vouchers, the two main tools in their “Dismantle public schools” took kit.
Unfortunately, the media too often plays into the hands of the anti-public school crowd, praising districts with high scores and condemning districts with low scores.
Connecting parental income and education to high scoring and low scoring schools is condemned as making excuses for bad teaching. The usual retort is, “You mean you think it’s OK for only 20 percent of the students to pass the AzMERIT test?” So the problems of poverty are pushed to the background, or, worse, they’re blamed on the schools: “Students who go to failing schools end up in poverty. We need to fix the schools so those poor children have the education they need to make something of themselves.” Voila! Society is off the hook for the scandalous level of income inequality and the inexcusable level of poverty in a country as prosperous as ours. If it’s all the schools’ fault and it has nothing to do with the way our society is structured, that means the way we as a country treat people at the bottom of the socioeconomic ladder is just fine. It’s the schools’ failure, not ours.
If you equate test scores with the quality of education and educators, the logical conclusion is, “Failing schools” are filled with failing teachers and administrators, while high scoring schools must have faculties and administrations that know how to get things done. Welcome to the wonderful world of self fulfilling prophecy. Schools in high income areas are already more attractive to teachers for a variety of reasons, but if you add the idea that teachers will be branded failures if they teach at “failing schools,” that stacks the deck against those schools even more. It becomes increasingly hard to attract teachers to those schools—meaning teacher vacancies will be concentrated there—and more and more, the teachers in those schools will be people who couldn’t get jobs at the “good schools.” And, why “throw money” at those bad schools where it will just be wasted by a staff that has no idea how to teach kids? Better to reward successful schools by giving them more computers and science labs, and working air conditioners and working toilets. Negative societal perceptions about schools with low income students lowers the quality of schools, which means they increasingly earn the label, “failing schools,” which means fewer teachers want to work there and less money is spent there, which means . . . the circle goes round and round as the schools spiral downward.
You’ll find no better example of the misuse of scores to label schools as successful and failing than Arizona’s “Results-based funding” plan which goes into effect this year. The idea is to reward “successful schools” for their success by giving them more money. Those schools will be able to give their teachers raises of $2,250 or more and have lots left over to buy educational goodies the rest of Arizona’s schools can only dream of. Naturally, teachers will flock to those schools, meaning every classroom will have a certified teacher cherry-picked from multiple applicants hoping to be among the select few. The students will have the best teachers Arizona can buy, along with newer textbooks, supplemented by more state-of-the-art computers and other educational supplies than schools not making the “successful school” cut. And how will success be determined? By scores on the AzMERIT test, of course. Other factors will come into play, especially the first year of the program, but it’s clear schools filled with children from the state’s most privileged families will be very well represented on the results-based funding list.
We get very little value from the tests, but they cause a serious amount of damage to our students and our public schools. That’s why I write about those damn AzMERIT scores so often, and will continue to do so when the occasion arises. Expect two more posts, at least, in the near future.
This article appears in Sep 28 – Oct 4, 2017.

It is true that high test scores do not necessarily mean “Excelling schools!” and low test scores do not necessarily mean “Failing schools!”
This is also true: you cannot get at the correct interpretations of what any given set of test scores means or how the educations of the students that produced those test scores might be improved simply by engaging in abstract, theoretical discussions about POVERTY and CULTURE and PRIVATIZATION.
You get at the correct interpretations of a set of test scores (and you develop an understanding of how services to the students who produced that set of scores may be improved) by tracking the details of governance, funding applications, staffing, and what specifically is happening in the classrooms, with what materials, delivered by what instructors with what kind of training, under what conditions.
Where is that kind of reporting for our largest local school district, serving more than 50,000 students? Not here, not in the Star.
http://tucson.com/news/local/work-harassment-complaint-filed-against-tusd-s-sedgwick/article_fda3f9b8-39d6-5f52-a42c-b0a9db2e98e4.html
While TUSD Governing Board members bicker and undermine one another and the Star wastes its time on reporting on every little detail of their infighting (and David Safier philosophizes) there are a lot of important, practical, nuts-and-bolts questions about conditions in TUSD that affect students’ ability to learn and to get better test scores that are going unanswered:
–How many subs outsourced to ESI are still covering classrooms that should be covered with permanent teachers?
–What was done to abate the lead problems in drinking fountains? (Lead poisoning at certain levels does affect ability to concentrate and learn.)
–Have teachers at schools experiencing discipline problems (e.g. Secrist, Utterback) received effective training in behavior management?
–Has the amount of desegregation funding re-allocated to increased legal fees under the last TUSD administration been scaled back, and has some of it now been applied to secure more fully qualified teachers for schools with hard-to-fill positions?
Just a few examples of the kinds of questions that should be asked and answered when the goal is to improve, not excuse, test scores in TUSD.
There are a couple of logical questions I have for you.
1. How is the incentive to cheat any different than with grades? If a test with stakes attached teaches administrators and teachers to cheat, doesn’t issuing kids grades teach kids to cheat?
After all grades have high stakes attached in terms of parental judgments, grade advancement, privileges like athletics, and eventually college acceptance, scholarships and lifelong higher income. Should we eliminate grading?
2. If having a test causes teachers to teach to the test, isn’t that a good thing? Provided the test is testing the right skills that we’ve agreed students should have and the test measures those skills effectively, then don’t we want teachers “teaching to the test”? And if it’s not testing the right skills or is unable to measure them, would you support testing if those problems were fixed?
I agree with you. But your writing uses too much narrative. TLDR.
Response to this column and bslap.
David, everything that you have said is absolutely correct but our current education culture is defined by all the mechanisms that you decry. Every year for almost hundred years the local papers have printed test scores which people then assume that the school created. Actually, as you point out, the school largely collects those test scores and, to some small degree, creates them.
There are only two ways to break this culture – 1) big system and 2) small system. I’ve tried the big system and it flopped totally. In 1999, I introduced the Finland technique of sampling, sampling 9,000 students a semester so that, with a low-stakes NAEP style test, we would have a large enough sample to know our results within one percentile point, thus enabling us to pick up and reinforce real changes. No one could comprehend what it was all about. It was just incomprehensible to them, the rationale, the idea that 9,000 tests would be more accurate than 500,000, the lack of feedback at the local level, etc. It was like fighting a forest fire with a squirt gun.
Right now, as you point out, tests like AzMerit are worthless for telling us if Arizona is better than last year. These tests told every state in the nation that they improved in 2015 then NAEP comes out and tells us the truth – that more than half got worse. Their annual tests were just an illusion.
The other way to break this culture is 2) small system. That’ s what I am trying now. To set a land speed record for academic gains in one year in one classroom. My simple hypothesis is derived from the research knowledge that students from poverty have poor attitudes towards school and as a result, practice the crafts of reading and math less than 2 minutes a day each.
This morning at 8 am, my all-minority all at-risk students did 12,489 math problems correctly – 97% of them without referring to their fingers, an average of 657 each. The high student did 1,414, the low student did 278. Over 50% of the class made overt expressions of joy at setting new records. I have created a culture of math where students strut around bragging about their abilities.
This afternoon, I will return to the class and if past patterns hold, they will do another 10,500 math problems correctly- their productivity dropping about 15% at the end of the day. Thus their total will be about 23,000 math problems in one day.
The teachers on this thread react with seething hostility assuming that I am destroying these children’s attitudes towards school with unending drudgery instead of asking the simple, “what is he doing differently?” The truth is that you can’t get students to do this much math if its drudgery. I know because all the scientific tools I use for math fluency are not yet available to me for math comprehension. Tomorrow, as we work on math comprehension (word problems), they will only do 50 problems each when in theory they could do as many as 500. Within three years, I will have all these problems solved.
In railing against this culture, you are not outside of it, you have chosen to be part of it. You assume that it is as good as we can get despite 40 years of no improvement.
I am determined to break its iron grip on the poor and minorities.
At least get those two-year gains by school for low SES students, rank them and go see what the very high gain schools and school districts are doing differently. Pull their erasure rates while you are at it.
Go talk to Ed Sloat of Peoria, he always did a regression analysis that washed away SES to look at real performance. See if he has a true ranking for you for the schools of the state.