Four links.
1. A NYT article on computerized grading of essays. I highlight the response of the Luddites:
“My first and greatest objection to the research is that they did not have any valid statistical test comparing the software directly to human graders,” said Mr. Perelman, a retired director of writing and a current researcher at M.I.T.
He is among a group of educators who last month began circulating a petition opposing automated assessment software. The group, which calls itself Professionals Against Machine Scoring of Student Essays in High-Stakes Assessment, has collected nearly 2,000 signatures, including some from luminaries like Noam Chomsky.
“Let’s face the realities of automatic essay scoring,” the group’s statement reads in part. “Computers cannot ‘read.’ They cannot measure the essentials of effective written communication: accuracy, reasoning, adequacy of evidence, good sense, ethical stance, convincing argument, meaningful organization, clarity, and veracity, among others.”
Suppose, for the sake of argument, that the software does poorly now and can be fooled easily. My bet is that within five years there will be software that can pass a Turing test of the following sort.
a. Assign 100 essays to be graded by four humans and the computer.
b. Show the graded essays to professors, without telling them which set was computer-graded, and have them rank the five sets of essays in terms of how well they were graded.
c. See if the computer’s grading comes in higher than 5th.
While we are waiting for this test, the NYT article points to a nice paper by Mark D. Shermis summarizing results of a comparison of various software essay-grading systems.
2. Isegoria points to Bloom’s 2-Sigma Problem,
The two-sigma part refers to average performance of ordinary students going up by two standard deviations when they received one-to-one tutoring and worked on material until they mastered it, and the problem part refers to the fact that such tutoring doesn’t come cheap.
I am skeptical. It is possible that this educational intervention is so radically different from anything else that has ever been tried that it works much better than other interventions. But I would bet that if another set of researchers were to attempt to replicate this study, they would fail to find similar results. In social science in general, we do too little replication. This is particularly important when someone claims to have made a striking finding.
3. In the comments on this post, I found this one particularly interesting and articulate:
I think K-12 public schools are about warehousing children, giving parents childcare, whether they are at work or simply want a break from being around their kids (the quality of parenting going on is incredibly wide-ranging).
…why the current system is still in place-Cost, Convenience, Comfortability and Childcare. Unfortunately, the one-size-fits-all approach is ineffective, makes young people passionately hate school (which breeds some serious anti-intellectual pathologies) and is becoming even more centralized in curriculum and control. (See Common Core curriculum adopted by 48 states.)
I think that the Childcare aspect deserves more notice. When President Obama supports universal pre-school, the “scientific” case is based almost entirely on taking kids out of homes of low-functioning parents. But what affluent parents hear is “Obama is going to pay for my child care,” and that is what makes the policy popular.
More generally, assume that as a parent you believe that your comparative advantage is to work, rather than spend the entire day with your child. Then ask yourself why as a parent you would prefer to have your child in school rather than home without supervision. Even if the child learns less at school than they would at home, you still might prefer the school, as long as you are convinced that it reduces the risk of your child getting into really bad trouble.
4. From Michael Strong, in a long comment pushing back on my post last week.
No one doubts that if one compares one group that receives significant practice in an activity against another group with no exposure to the activity at all, that a treatment effect exists.
Why then are so many people skeptical that interventions in education make a difference? Largely because the comparisons exist between idiotic variations within a government-dominated industry.
As a rejoinder, I might start by changing “receives significant practice” to “engages in significant practice.” “Learning a skill” and “engaging in significant practice” are so closely related that I would say that, to a first approximation, they are the same thing.
This leads me to the following restatement of the null hypothesis.
The null hypothesis is that when you attempt an educational intervention, such as a new teaching method, the overall economic value of the skills that an individual acquires from age 5 to 20 is not affected by that intervention. I will grant that if you take two equivalent groups of young people and give one group daily violin lessons and the other group daily clarinet lessons then the first group is more likely to end up better violinists on average.
But when economists measure educational outcomes, they usually look at earnings, which result from the market value of skills acquired. To affect that, you have to affect the ability and willingness of a person to engage in practice in a combination of generally applicable fields and fields that are that person’s comparative advantage.
Aptitude and determination matter. Consider Malcolm Gladwell’s “10,000 hour rule” for becoming an expert at something. There is a huge selection bias going on in that rule. How many people who have little aptitude for shooting a basketball are going to keep practicing basketball for 10,000 hours?
When you consider how hard it is to move the needle half a standard deviation on a fourth-grade reading comprehension exam, the chances are slim that you are going to come up with something that affects long-term overall outcomes. Until we get the Young Lady’s Illustrated Primer.
I’ve done a little digging, and the only problem I’ve found with Bloom’s Two-Sigman problem is that he used two different levels of mastery for his two mastery-based teaching methods.
So, instead of showing a one-sigma leap from moving from ordinary classroom instruction to mastery-based classroom instruction and another one-sigma leap for moving from mastery-based classroom instruction to mastery-based one-on-one instruction, we see a one-sigma leap from moving from ordinary instruction to 80%-mastery-based instruction and another one-sigma leap for moving from 80%-mastery-based classroom instruction to 90%-mastery-based (where those percent scores reflect a passing “mastery” score on a unit test).
I shouldn’t have said only problem. The other, bigger problem is that the effect isn’t two-sigma when other people try to replicate it, but just under one sigma.
The good news is that computer tutoring seems to achieve the same results as human tutoring.
There are other constraints or “pins” – for example, seems to me that it is illegal to leave a sub-14 or 16 year old child home alone for any sustained length of time (various news stories about CPS cases around this.) (Friend pointed out that is is statistically safer to you leave your child at home when you drive to a store, but you can’t do that, so the child is exposed to the hazards of road travel per force.)
There’s another aspect of education that seems to be missed. We’ve talked about the value to children, and the value to parents, but many people miss the dominating value to *the rest of society*. As in everybody else wants every child to grow to up be employable, understand how to vote in elections, see the terrible errors in kiting checks or destroying mailboxes, and generally *fit into society*.
So to some extent, “we” don’t care what “treatment effects on income” there are, but “we” care a lot that people “behave”.
I think any analysis of the effects of education that fails to see it as a kind of indoctrination in how to behave in our society is incomplete.
One on one tutoring serves as a substitute for conscientiousness. It might help you pass the exam. You might even retain the particular skills, but you won’t gain the conscientiousness.