Computers Grading Common Core Essay Questions? What Could Possibly Go Wrong?

David SafierNovember 18, 2014June 4, 2022

Image courtesy of shutterstock.com

The high stakes tests for Common Core are supposed to emphasize thinking more than the tests for, say, AIMS. One way to do that is to emphasize written analysis as an important component of the tests. But grading papers is a slow, expensive process. You have to hire people, train them and make sure the essays are scored by multiple people to reach an acceptable level of consistency in the grading.

PARCC (part of Pearson Education), which is putting together one of the tests states can choose from, has a solution. Let computers grade the essays. From Politico:

The PARCC exams are designed to challenge students to read closely, think deeply and write sophisticated analyses of complex texts. But hiring people to read all that student writing is expensive. So Pearson’s four-year contract to administer the exams bases the pricing on a phase-in of automated scoring. All student writing will be scored by real people this coming spring. The following year, the plan calls for two-thirds to be scored by computer. The year after that, all the writing is scheduled to be robo-graded, with humans giving a small sampling a second read as quality control.

Some states are having a little trouble with the robo-grading concept, so PARCC spokesman David Connerty-Marin said the states are “conducting studies” to see how well it all works . . . except, not quite.

[Connerty-Marin] later acknowledged that states aren’t doing their own studies; they’re relying on the Pearson report.

Right. And pharmaceutical companies should run the tests to decide whether their new drugs are safe and effective.

Get ready for the next wave of beat-the-test tutoring: How to fool a computer into thinking you’re writing something important even though most of it is nonsense and gibberish.

This article appears in Nov 13-19, 2014.

6 replies on “Computers Grading Common Core Essay Questions? What Could Possibly Go Wrong?”

Rat says:

November 19, 2014 at 6:59 am

How is this different than awarding tenure to a union brother/sister?
Rick Spanier says:

November 19, 2014 at 7:18 am

I use this terrific speach to test program and it artly ever makes many errands.

Reboots grading assays? Swill!

Maybe Pearson is on to something, next up robot teachers. Lower costs and guaranteed alignment of the Pearson curriculum with the Pearson student assessments without human teachers gumming up the works. The ultimate factory model of education brought to you by the Obama administration, the education-government complex and the Bill and Melinda Gates Foundation.
a says:

November 19, 2014 at 7:29 am

One more reason to scrap the “factory model” of education that big money seems to think works the same for all children! More money for them and fewer dedicated teachers in the classroom–just give everyone a laptop and all will be good-not!
Perkin Warbeck says:

November 19, 2014 at 8:44 am

The grading software should be made publicly available so that it can be critiqued by independent experts and so that I can try it on samples of my own writing.
Aaron Johnson says:

November 19, 2014 at 2:57 pm

I am irritated that: 1) literate people think machine-grading of K-12 written responses is a new concept and practice, and 2), that many may think that such machine-grading has been mainly or solely pushed by private corporations, instead of public institutions (education departments and colleges).
There are publicly available articles online debating, promoting, and critiquing machine-grading, or “scoring”; they are not hard to locate.
I do agree with Mr. Warbeck above that software used should be tried out by the public, but primarily to engage more of the population in actually helping kids and adults learn.
For Mr. Safier, after noting that I appreciate much of what you do to broadcast education issues here, an edit question: Article states “PARCC (part of Pearson Education)”–was this a goof-up, or a belief?
I hope the former. For good and ill, the states involved in PARCC put their education department representatives to work on it. They are people belonging to both major political parties, they are people who have worked for years within state institutions, and they are people who are known (inside the states) to reporters or commentators. If their involvement is not recognized, I suggest willful ignorance by citizens and journalists. This applies whether to PARCC, its rival SBAC, and Common Core (whether to standards’ content or the separate topic of test question design.) The public dropped the ball a long time ago; then kicked it down an alley along with the kids and forgot about it.
socrates2 says:

November 20, 2014 at 9:27 am

Politico writes on the _grading_ of PARCC exams, “The following year, the plan calls for two-thirds to be scored by computer. The year after that, all the writing is scheduled to be robo-graded, with humans giving a small sampling a second read as quality control.”
What else is new? When I took the bar exam on the west coast decades back, the typical bar grader, I understood, spent two minutes per blue book!
The bar exam committee assigned each test-taker an hour to read the question and respond–about 5 to 10 minutes to read and ponder the call of the question and the remaining 55 to 50 minutes to write an answer containing critical legal buzzwords and required phrases that displayed a working knowledge of concepts the bar considered essential.
At the time, urban legend had it that these bar exam essay graders, who reputedly got paid a dollar per blue book graded, would take a pile of these blue books to the local baseball stadium and grade between innings! After all, they had been told what phrases and words to look for in a particular essay. If all of these occurred, then the paper passed high, intermediate or low depending on how many made it on the pages. If enough of those words/phrases did not make it on the essay, the student did not pass the exam…
In my case at least the essay robo-readers were human. I can see where a machine can be programmed to scan for particular words, terms or phrases with a smattering of sheer nonsense in between, yet the paper may get an A–if the student has the savvy to throw them in!
Be well.