The state has released scores on the AzMERIT tests given this spring, meaning we can compare TUSD’s 2017 scores with its scores two years ago when students took the first AzMERIT tests, and with the state scores. I’ll lay out the results the numbers first, then I’ll try to figure out what they mean, and don’t mean.

But first, let me repeat my intense dislike of our obsession with high stakes, standardized tests. They only test what’s testable in a fill-in-the-bubble format. They’re susceptible to being gamed, meaning the better teachers are at teaching to the test, the better their students’ results. That means the reliability of the results as a measure of student achievement is questionable. Also, the emphasis on the tests distorts the curriculum at the same time it stifles teachers’ creativity and their ability to tailor their teaching strategies to their students’ needs. The yearly tests make the education we give our students worse, not better. Nonetheless, the tests are out there, and people will talk. So with these caveats in mind, I’ll talk too.

Here’s a summary of the AzMERIT results, without analysis or interpretation. Statewide, fewer than half the students passed the test in every grade. The passage rates range from 25 percent to 48 percent. However, the average passing rate rose about 4 percentage points since the first test was given in 2015. TUSD’s passing rate is considerably lower than the state’s, averaging 11 points lower in Language Arts and 13 points lower in Math. The district’s average passing rate didn’t change in Language Arts from 2015 to 2017 and went up one percent in Math, meaning TUSD’s scores showed less improvement than the state as a whole. White and Asian students scored considerably higher than Hispanic, Native American and African American students at the district and the state level.

Now, some analysis. First, the passing rates. As any teacher knows, you can create tests that are easier and harder, and you can move the grade curve up or down depending on where you set the cut scores. The old AIMS test was thought to be too easy and too many students passed it, so the state created a harder test and set the passing scores at a level that fewer students passed. So the fact that far fewer students passed AzMERIT than AIMS doesn’t mean our students know less than they did a few years ago. It just means we have a tougher curve on a tougher test.

Fewer TUSD students passed than the state average, and at both the TUSD and state levels, White and Asian students scored higher than Hispanic, Native American and African American students. That information is about as surprising and revelatory as saying the yearly temperature in Tucson is higher than it is in Seattle. Of course Tucson is warmer, that’s how the global climate is structured! Of course Whites and Asians outperform Hispanics, Native Americans and African Americans on standardized tests, that’s how the households’ economic and educational status is structured! And of course the state outperforms TUSD on standardized test scores, the district has a lower percentage of high scoring White and Asian students and a higher percentage of Hispanic, Native American and African American students than the state as a whole.

None of this is a judgement on any group. Far from it. It’s a judgement of our society’s shameful economic, racial and ethnic inequality. If we lower the levels of inequality, the gaps in student scores will close as a result. It’s overstating things, but not by much, to say we could learn as much about student achievement, and save ourselves a whole lot of money, by getting rid of the tests and just looking up students’ zip codes.

But do the demographics explain an 11 point passing gap in Language Arts and a 13 point gap in Math between TUSD and the state? Maybe so, maybe no. It would take a look at the complete statewide data by someone far more sophisticated in math and statistics than I to answer that question. But here is a possible explanation for a portion of the gap. An odd thing happens to TUSD enrollment from grade 5 to grade 9. The student count takes a big drop between grades 5 and 6 — 400 students last school year — then picks up much of what it lost between grades 8 and 9  — 250 students last school year. The drop isn’t spread evenly across ethnic groups. Percentage-wise, more White students leave than any other group. For whatever reason, lots of parents pull their kids out of TUSD for middle school then bring them back for high school, and it’s more likely to be White kids who leave, the ones who tend to score highest on the AzMERIT. That helps explain why the gap between TUSD and the state is greatest in the middle school years. Take those three years out of the equation, and the average gap shrinks by two percent in both Language Arts and Math.

The most troubling numbers for TUSD are the improvement numbers from 2015 to 2017. They are significantly lower than the state’s rate of improvement, which makes it look like the district is doing a worse job educating its students than the rest of the state. That’s a possible explanation, and if it’s true, TUSD damn well better get its act together. But here’s what I don’t know, and won’t know until the complete data is released by the state later this year. It’s very possible that statewide, the improvement numbers are better for the higher scoring White and Asian students. They may have become more test savvy than the other students over three years of testing, and they also may have improved more quickly in the specific subject matter covered on the tests. If so, districts with a higher percentage of White and Asian students would make a significantly higher jump in their scores than districts with more Hispanic, Native American and African American students. In other words, TUSD’s improvement numbers may be similar to districts with similar ethnic makeups. I don’t know if this is true, but it’s something I plan to check out when the rest of the data is made available.

Bottom line, anyone who looks at the raw results for the AzMERIT test and gasps at the state’s low passing rate and TUSD’s even lower rate is missing a lot of what’s going on. By themselves, the numbers only tell part of the tale.

3 replies on “A Look At TUSD’s AzMERIT Scores”

  1. “They only test what’s testable in a fill-in-the-bubble format” – very true, but, on average, I’d bet the students who can reason well and write well also do well on the tests. You have to measure somehow or you can never know what is improving and the state is too lazy to actually have more meaningful tests.
    You can blame high rates of immigration of poor and unskilled people from countries with little tradition of education for most of TUSD’s issues.

  2. I wondered, as I looked at the data, exactly how it is collected. That is, as we watch from year to year, aren’t we watching roughly the same kids, as they go through? 2015’s 5th graders are 2016’s sixth graders, right? What effect does that have on the scores? And if we know anything from the scores, its that the Districts’ desegregation efforts to reduce the achievement gap are not working, no matter how much window dressing top Administration puts on it.The district has been failing its black and brown kids in epic proportions, all the while paying vast sums of money to lawyers to say “it aint so”. Needless to say, that money would be far better spent in the classroom.

    And then from a different direction there are possible other explanations for our district wide poor achievement. Almost the only good answer by the candidate for superintendent last night referred to the effect of Board behavior on district performance. He stated that he had read a study recently that correlated divisive, untrusting board behavior (as measured by watching videos of the Boards in action) with low scoring districts. In other words, looking across multiple districts, and observing Board behavior on video, the more the members of the Boards trusted each other, or at least acted as if they did, the higher performing the Distric, as measured by grading (A-F). It seems that TUSD voters succeed in switching Board majorities, but the divisive, cruel and manipulative way the members treat each other never seems to change much. This might be an additional place to look for WHY our District cannot seem to get out of its achievement crisis. Perhaps the adults in the room need to model better behavior as they (supposedly) focus on the children in the schools.

  3. You express concern about the difference between TUSD’s improvement and Arizona’s overall improvement. That difference is both meaningless and highly informative.

    Meaningless in the sense that it is real with statistical significance.

    Highly informative in the sense that Arizona cant be successful unless TUSD hugely increases the pace of academic growth of its students.

    At less than $15 per test, the AZMerit is a low security test, as compared to high security tests like Advanced Placement at more than $60 per test.

    We know from nationwide test data and comparisons with NAEP that almost all of the changes in state test scores on tests like the Iowa Test of Basic skills, the Stanford 9, AZMerit etc can be ascribed to security issues. When completely new versions of the test are released, test scores plunge and then steadily increase until a new version is released. Yet, NAEP scores are flat by comparison.

    In this digital age, undoubtedly thousands of teachers have complete copies of last year’s AZMerit test.

    We also saw these phenomena in AZ school districts with merit pay linked to test scores. They had higher erasure rates than other school districts. This is a strong indication that the behavior went over the line to absolute cheating – erasing a wrong answer and putting in a correct answer.

    These security issues call into question everything about education culture. The new A to F system is completely built around growth models and the new performance funding model will be completely built around the A to F.

    Back in 1992, Tennessee did what Arizona is doing now, tried to build a new education culture around growth models instead raw test scores. Didn’t work. Actually backfired. Tennessee’s NAEP scores fell relative to the rest of the nation.

    In the Urban Institute Analysis, perhaps the best ranking of states in existence right now, Tennessee ranks 41st on an apples to apples comparison of test scores. Who copies number 41?

    You indicated that the 50 plus formula A to F model was “above your pay grade.” Not really. It emphasizes growth in duplicative ways and more heavily than the old model. The blizzard of formulas just disguises its simplistic nature. Does your students get two more problems right than they did last year? If so, your school gets an A, if not, a lower grade.

    The formulas are creative in the sense that they go back further in history to calculate not just one year gains but two and three year gains. This increases the sample size and also makes it a little bit harder for cheaters to skate by.

    I don’t fault the staff who created this. When you look at it, is it the Frankenstein of education policy or the ultimate refined expression of what you might be able to do if accountability actually worked?

    The blunt truth: all mass inspection tests like AZMerit “accountability” lead to a dead end and worse outcomes for students.

    To best improve test scores you would put an end to AZMerit and just rely on NAEP to measure outcomes, allowing education to evolve naturally under the interaction of parents making choices and schools changing to better compete for students.

    Thats what Finland did.

    Not going to happen here.

Comments are closed.