Confessions of a Reluctant Teacher

September 28, 2009

How Standardized Tests Punish Nonconformity

In a previous post, I linked to an article by a former standardized testing grader, Todd Farley. He has an op-ed piece in the New York Times today entitled “Reading Incomprehension,” which is also very eye-opening. He talks about the difficulty of scoring open-ended items on a standardized rubric and he gives a few examples of especially confounding student responses.

I disagree, however, with his opening claim that “the problem [with standardized testing] is not so much the tests themselves–it’s the people scoring them.” Open-ended questions are inherently subjective and therefore difficult to grade according to a rubric.  Even if the scoring was done “only by professionals who have made a commitment to education,” a lot of the results would still be influenced by personal bias.  For example, take the review of the X-rated “Debbie Does Dallas” which Mr. Farley encountered.  While he described it as well-written and hilarious, normally qualities that would merit a 6 (“genius”) grade, it ended up being given a 0 because it discussed a pornographic movie.  This student obviously possessed a sense of humor, the ability to think independently and question authority, and enough writing skill to craft a “comprehensive analysis” that was “artfully written.”  Yep, sounds like that kid needed to be put in his or her place with a big, fat zero.  Would a “professional who was committed to education” have made a different call?  It would depend entirely on the individual professional.  While one who appreciated the student’s wit and individualism might have given it a higher grade, one who was offended by children and teenagers exploring their sexuality might have given it the same zero.

In another example, Mr. Farley tries to decipher whether a drawing of a child wearing a helmet while riding a bike over a flaming oil spill properly demonstrated “an understanding of bike safety.”  Since the student had just read a passage about bike safety, and most children don’t encounter walls of fire while tooling around the neighborhood, I would say, “Yes, of course.  The kid in the picture is wearing a helmet.  He’s just trying to inject some humor into the lifeless, eternally dull process of taking a standardized test.”  Mr. Farley, however, was stumped as to how many points he should assign.  The experience showed him that “the score any student would earn mostly depended on which temporary employee viewed his response.”  I think that the student’s score would depend, not on the scorer’s background or “commitment to education,” but on the scorer’s “commitment” to enforcing conformity and punishing those who color outside the lines.  The scorers are hired to judge, not to educate.  Because of the way standardized tests are set up, we can’t just ask the student about their intentions with the drawing.  That would require a sense of humor and respect for a child’s thought process.  I’m sure the test designers thought that having the students draw a picture in response to a reading passage would be a creative learning exercise.  I hope that child wasn’t penalized too harshly for being creative in the wrong way.

Ultimately, Mr. Farley and his colleagues had trouble with some of these responses because the rubric doesn’t work if the student refuses to conform, not necessarily because they were only “temporary employees.”  The problem is still with the test and what it demands of students, not with the people who score it.

Like this post? Keep in touch: follow me on Twitter!



  1. And ultimately punishing the unusual results in people who can’t be creative. If the only acceptable answer is the one the majority thinks of, then what happens when we need inventors?

    Comment by Uninvoked — September 28, 2009 @ 11:00 pm | Reply

    • So true! As the saying goes, “The definition of insanity is doing the same thing over and over again, while expecting different results.” When we come to a point where we need different results, we will have a huge population trained, basically, in doing the same thing over and over again. Thanks for your comment!

      The Reluctant Teacher

      Comment by christinag503 — October 9, 2009 @ 9:34 pm | Reply

  2. I had a similar experience on the SAT writing test. I forget exactly what the essay prompt was, but I do remember that it was something vomit-inducingly corny like “discuss the value of hard work and determination in determining success.” As a seasoned cynic, I remember putting most of my mental effort while writing into finding ways to mock the vapidity of the prompt without being too blatant about it. I ended up getting an 8 out of 12 on the essay, which seems a bit incongruous with my 80 out of 81 on the multiple choice portion of the writing test and my 2300 on the overall exam. Now I’m studying music at a major conservatory and I couldn’t be happier to have escaped the intellectually stifling American K-12 academic system.

    The number one tip for standardized essay tests is: no matter the prompt, find a way to write about coping with the loss of a relative and you’ll get the top score, guaranteed. I’m dead serious — I know someone from elementary school, not a terribly good writer, who wrote about her grandmother dying on the California Integrated Writing Assessment and not only got a 6 out of 6, but was contacted by the testing company to ask if they could use her essay as an example of a 6 in their test prep materials.

    Comment by proud conservatory student — October 17, 2009 @ 7:24 pm | Reply

    • Ha! Thanks for the tip! I will have to pass that along to my students. Thanks for commenting!

      Comment by christinag503 — October 17, 2009 @ 8:32 pm | Reply

  3. The problem with tests is neither the test itself nor the individuals scoring the test, but those teachers who spend weeks, months and years sitting on committees, battling through the peeing contests in interminable committee meetings, to compose test prompts and the rubrics that test scorers must follow to score them. Do they receive special training to do this job? Are they experts in educational measurement, large-scale assessment, evaluation, and policy issues? Do they have degrees in these fields? Could schools afford to pay those types of experts, even if they wanted them? Probably not.

    I have always said that if those teachers and administrators had to spend one testing season sitting in front of a monitor scoring thousands and thousands of test prompts for each item, instead of the mere hundreds they might see in their own school, their own school district, they would change the way they write prompts and the rubrics they write to score their prompts. These educrats simply cannot imagine the wide range of variations that enter students’ minds when students read and reply to these prompts. So their rubrics reflect their own limitations, their inability to think the way their students think.

    Scorers must confine their scoring to the limitations of these rubrics, often burdened with the ethnic, gender, social, cultural and linguistic biases of the teachers who write them. In our mobile society, with children moving from school to school, district to district, state to state, even country to country, these biases severely limit the scorer’s ability to score the items appropriately.

    Do I have an answer to this “accountability” problem? No. But as long as states choose to rely on standardized testing, imperfect though it may be, as the preferred method of determining learning accountability in our schools, flawed prompts and further flawed rubrics will often handicap those who struggle to score student responses fairly. Some of those people are my good friends and they really care about the students, want them to succeed. But, stymied by the inadequacies of some prompts and some rubrics, scorers strain to read the handwriting of thousands of students, struggle to decipher what each student has written, to comprehend what each student is trying to say, and work very hard to fairly score these responses, even when handcuffed by those who write the prompts and the rubrics.

    Comment by N. Curry — January 15, 2010 @ 4:08 pm | Reply

    • Thank you for adding your insight to the discussion! I hadn’t taken into consideration the role played by the creators of the tests. I think what you said about the rubrics reflecting “their inability to think the way their students think” is spot-on.

      I do wonder, though, if it is even possible to create prompts and rubrics that are unbiased. My only experience with standardized tests is the SAT, and it is far more limited than yours, so I don’t know what other types of prompts and rubrics are like. I had thought that the problem was with the structure of these tests, the very “standardization” of them. Do you know of any examples of unbiased rubrics? I’d love to explore other resources on this topic!


      Comment by christinag503 — January 15, 2010 @ 10:22 pm | Reply

      • An example of a culturally and sexually biased prompt might be:

        Students read a passage about a Hawaiian teen who lives on the mainland and visits her aunt in Honolulu. Her aunt shows her several traditional Hawaiian outfits, explaining about the styles, their designs and who would traditionally wear each. Some were traditionally for unmarried women and some were for married women, and her aunt explained why. Then her aunt allowed the teen to choose one for herself. The prompt asked the student to decide which outfit the teen would most likely have chosen and then to explain why, based on information in the passage.

        Sounds like a pretty good prompt, right? It would be one of the rare ones that actually appeals to female students, or those interested in fashion. But in itself, that is a sexual bias. Sounds like it would allow for just about any answer as long as the student supported his or her choice, right? Except that the rubric might say that the teen would choose the one that was for an unmarried woman, since the teen was an unmarried woman, period. And any other answer would then be wrong, get zero points according to the rubric. That is blatant cultural and sexual bias. A student might respond to this type of prompt suggesting that the teen girl would choose the outfit that she liked best, because of its colors or its design, not because she was a single woman. The teen girl in the passage was, after all, a modern Hawaiian girl living on the mainland, not in traditional Hawaii.

        Or here’s another:

        The student might read a passage about a young student who babysits for a couple who are musicians in a symphony orchestra. The reading passage explains that this was a good job for the teen student because she earned a certain amount every week when the couple went to rehearsals and even more on days when the orchestra was performing in concert. Then the prompt might ask the student to explain something about why the student would earn more for concert babysitting than for rehearsal babysitting.

        Again, sounds good, right? Nothing too complicated. Except that the prompt operates on the assumption that the student is familiar with symphony orchestras, understands what an orchestral rehearsal is as compared to an orchestral concert. Many students who live in rural areas and some in the inner city areas have never seen a symphony orchestra unless their parents watch it on PBS. Many have had no involvement in anything that required rehearsals or concerts. Many of the terms used in this reading passage would be culturally unfamiliar to those students. The very concept of a paid babysitter might be foreign to those in certain cultural or rural settings. And if the prompt only offers one correct response many students would not be able to answer this and get the full score.

        I am sure that with a little imagination most can understand that it is trickier than one might think to offer good reading passages that hold the students’ interest so that the student will read the whole thing, and then to write good prompts that allow the student to guess the right answer, supposedly based on the reading, and to reply appropriately.

        Since these types of reading comprehension prompts are pretty standard at all grade levels in almost all states, this is an area where the teachers and administrators who write these prompts need to use a little common sense. But somehow, even after years of working on a prompt and rubric, that quality is often lacking in test prompts. Teachers tend to think like teachers and fail to comprehend that students come from such diverse backgrounds, perhaps significantly different from that of their college educated teacher, that the student might have no idea why their response would not be right. Many test scorers will perceive things the same way as the students, or at least will understand the wider variety of student responses one could reasonably expect. The test scorers don’t have their egos all involved in whose reading passage suggestion gets selected, or whose prompt questions and accompanying acceptable rubrics end up on the test.

        Remember that the only items test scorers are scoring are those items that require a hand-written response. Other items are machine scored, generally multiple choice. Hopefully you can understand that after struggling to read the printing, cursive handwriting, or scrawl of thousands of students, trying to decipher the spelling and decode whatever the student has attempted to write, that these other issues create unnecessary roadblocks to good testing and good scoring. Sometimes just the imaging alone can stump a scorer who simply can’t read the student’s response because the image on the monitor is so poor. In such cases the scorer can “request the original,” but that often takes time and is seldom a good option when scoring projects don’t last that long to begin with.

        I hope that this shines a little light on a few of the problems of standardized testing. As I said before, I don’t have answers. I know that back in the day when teachers wrote all their own tests and all student grades came from their classroom teachers only that favoritism and entitlement issues, student behavior histories and cultural biases, even sexual biases (boys get more attention than girls in most classrooms) often tipped the scales on grading issues. So that’s not the answer, either. But if testing could start off with good reading selections and good prompts, then follow up with good scoring rubrics for each prompt, then it would be easier and hopefully that would lead to more meaningful data from these tests. Can anyone else offer a better answer, a better solution for learning accountability?

        For the record, I am an adamant unschooler, or as I prefer, a proponent of self-directed learning. But if a student decides that he or she wants to attend a college or university, wants to earn a college degree, then that student must prepare for testing, must learn that skill through self-directed mastery, too. All this is good preparation for most jobs today, in which the same kinds of comparisons and assessments emerge in job evaluations and promotions. So, unless the student has selected a totally self-directed path, it doesn’t hurt to understand and master the skills required to do well on tests. What’s great about self-directed learning is that the student understands that he or she can choose.

        Comment by N. Curry — January 16, 2010 @ 6:29 pm

      • Thanks so much for those examples…I confess, at first glance, I didn’t really see what could be considered biased about those, but your explanation makes a lot of sense. I was actually wondering what completely unbiased questions would look like, but I see your point: it can be tricky for an individual to write a question that takes into account every student’s cultural/gender/socieconomic/ethnic background, but even if we had teachers writing questions for just their own classes, biases would still be present (and that would make the whole process somewhat less standardized, it seems).

        I agree with you that going to college to earn a degree will require some exposure to testing, standardized or otherwise. But I loved John Taylor Gatto’s vision of students destroying the academic influence of the SAT by putting down their pencils en masse and saying “I’d prefer not to.” I’ll have to post his essay on that up here soon. I also recommend Paul Graham’s essay, where he talks about credentialism and why it will soon be out of fashion in our economy. But definitely, preparing for standardized tests because it is a requirement of your own chosen path is a very different experience than taking them just because everybody else is and it’s what you’re “supposed” to do.

        And just as an afterthought, I took an Industrial/Organizational Psychology class at PSU, and according to my professor, who worked in that field, those comparisons and assessments used in job evaluations are largely used by corporations to protect themselves from lawsuits from disgruntled employees who might have been passed up for a promotion or fired. They do have their helpful aspects, but it seems to me they also function like standardized tests: they give the illusion of merit-based advancement, but hidden biases can hurt certain minorities.


        Comment by christinag503 — January 20, 2010 @ 7:29 pm

RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Create a free website or blog at

%d bloggers like this: