Tuesday, June 13, 2017

Thoughts on Education Research

My friends know I have problems with education research. I think there are a number of reasons for this and I'll return to those reasons at the end of the post. First, I'd like to share a few articles I've read as part of my elective coursework this summer.

Long & Grove (2003) How engagement strategies and literature circles promote critical response in a fourth-grade, urban classroom is a good example of the kind of research that upsets me. The "study" involves the authors observing a classroom as several engagement strategies were used. These aren't the authors' strategies, rather they are pulling from other research to assess those strategies. The three strategies are:
1. Ask open-ended questions; listen to, honor, and respond to students; and encourage students to read between the lines of the text.
2. Invite students to investigate and find out about explicit or implicit text information - to dig a little deeper into the text's meaning.
3. Encourage students to pose and solve problems about important text events.
They also set out to evaluate literature circles - a fancy way of saying that kids discuss stories in small groups. The key difference between the engagement strategies and the literature circles is that the strategies involve the teacher more and tend to include the reading as part of the group's activities whereas literature circles typically assume the reading is done beforehand.

And that's it, really. There's no treatment. There's no breakdown of larger impacts on the students' abilities. There's no comparison to other strategies. The total number of observations was four. The total number of students observed was 27 - one class four times, in other words. It's simply a report of their observations with some pointed language about how they already knew this was the best way to teach reading. The article isn't really even an argument. Long & Grove begin with a quote from a student displaying enthusiasm for reading after participating in these engagement strategies and end with a discussion of how much these engagement strategies are, well, engaging. The stories they picked featured themes of social justice and were aimed at being relevant to the young readers in their class. The students, unsurprisingly, picked up on the relevance and the themes which promoted engagement. Indeed, the only part that seemed to resemble analysis of these strategies compared them with the same class before the authors became involved.
When we observed Belinda Sweet's class at other times, we didn't see the students critically respond in a fluent, automatic way. By integrating engagement strategies and literature circles over a short time in this classroom we positioned students so that they could critically respond to the issues at hand. 
I have many questions: how do we know this is the case? What evidence do we have that the engagement strategies and literature circles were the crucial difference? How often were these students observed prior to beginning the "study"? What strategies were used before? Did the participation of the "observers"change the outcome or would the same results have occurred had only the teacher delivered these strategies?

You get the point. The study accomplishes nothing because it wasn't really a study. It's just a description of something the authors already assume works from their own prior experiences and from the research they cite. (Also, much of the research they cite is similar in form to this article. Observations of small classes. The "right" strategies are assumed and implemented and turn out to be "right".)

I'm upset for two reasons. First, aren't these strategies widespread and obvious by now? Student centered approaches to teaching have been commonplace for 30 years. Group work and discussions have been features of the classroom since I was a child and they were central to my own training in secondary English education. Even the literature circles method cited in the "study" is from Harvey Daniels' 1994 Literature Circles: Voice and choice in one student-centered classroom. (See! ONE classroom.) Second, there is no way for this "study" to participate in the larger dialogue about education. Without broader contexts for the students and broader evaluation of the outcomes of such approaches, schools and districts are unlikely to care about engagement strategies. Educational research needs to push policy and that means it needs to come back to outcomes. Show that these engagement strategies make students better readers. Show that they build comprehension. Because, the thing is, I do think these kinds of strategies are effective in those regards. Kids reading and responding is always better than kids not reading or responding. I just want to see some good indicators that this is the case show up in the research.

Next up is Diane Santori's "Search for the answers" or "Talk about the story"?: School-based Literacy Participation Structures (2011). Santori "explores" how five students respond to three different literacy participation structures. (Yes, only five students.) Shared reading is the most "traditional" of the structures with teachers and students in small groups together reading a story out loud and discussing aspects of the text. Santori notes that this approach is teacher centered when it comes to the discussion and the discussion focuses on particular aspects of the curriculum as directed by Common Core. The kids need to learn about, say, personification so the teacher has them find an example in the text after she gives them the definition of personification.

Guided reading is the middle strategy. Guided reading lets the kids have more freedom in their discussions but still falls short of the author's preferred methods. The kids would, among other things, make predictions, note unknown words for later vocabulary work, and create questions from their readings to ask their groups or the class. Despite this more student-centered approach, the author states she observed many teachers focusing discussion of the text in one direction. While this approach was good for locating features of the text and summarizing it, the kids didn't interact with it at a deeper level. Because the instructor was still guiding the discussion, it also tended to focus on curricular goals.

As you might expect, the third and final strategy is the be-all-end-all of literacy participation structures. Shared Evaluation Pedagogy (infuriatingly abbreviated as SHEP, where does the H come from?) begins the same as guided reading. Students read independently and write questions and difficult words and use sticky notes. The discussion, however, is left mostly open to the students. The author says her role was minimal and that the students directed themselves. According to Santori's reporting, they engaged deeply with the texts and collaborated to construct meaning. If they reached a dead end or the students began talking about the text in a way that didn't relate to the story, Santori would "ask them to provide evidence for their theories." The discussion focused on what the students cared about and what they engaged with instead of on the goals of the curriculum.

Interestingly, this comprehensive study of five students features a quantitative analysis of 9 "text moves" by the students. A text move is when the student does something to improve comprehension during a discussion of a text. If a student makes a connection from the text to personal experience, that is the "connect" text move. The other text moves are hypothesize, recall, genre, clarify, summarize, synthesize, vocabulary, and other. We learn on table 2 that students used a total of 293 moves during shared reading, 291 moves during guided reading, and 403 moves during SHEP. Of the total times the students hypothesized, 57.5% were in SHEP. Other occurred 15.8% in shared reading and 10.9% in guided reading but only 2% in SHEP.

Including this quantitative piece seems like an exercise in parody. I'm not really sure how it helps the author maker her case. Santori barely discusses the numbers and does not use the numbers to draw any conclusions about each approach. Of course, the wisdom of doing quantitative analysis on a sample size of 5 is laughable on its own. I'm not even sure a sample of 5 can be representative of a class of 30 students much less a school or the whole student population of the nation yet we get this semi-statistical breakdown thrown in here for some reason. I think the argument is something like "more moves = more gooder learning" but even that isn't apparent from the data and there's no indication that doing more hypothesizing (which is where 232 of that 403 comes from) is any better than doing more vocabulary or recall (which happened more in shared reading and guided reading). It's as if the author was being forced to include more numerical analysis so she just made something up and tacked it on.

Much like Long & Grove, Santori assumes from the outset that SHEP is the superior approach. She begins the article with the argument that students need to assert their textual agency and that by doing so they learn better. (At least she makes that point because Long & Grove don't even get that far.) Moreover, Santori includes alternative approaches and discusses their implementation. Importantly, she identifies the ways in which policy is cramping classroom activities. Teachers pursue strategies that fit district and federal guidelines but those view literacy and comprehension in a very limited way. By being limited, students don't engage and don't learn as much. Additionally Santori argues they should be able to go off script because actually addressing the issues and discussing them with each other makes them more well rounded. (My language not hers.)

I see this as a much better study when compared with Long & Grove but there's still not much done to assess improvement to the students. So they had some good discussions and connected well with the texts, what next? How did that work out for them? Did they stay engaged? Were they better readers? Outcomes matter and I want research to address that. Merely participating in rich textual experiences is not enough. If our researchers aren't interested in going for longer studies and looking at outcomes, they're going to keep losing ground to other fields doing more robust research. Economists and corporate statisticians are driving education policy at the federal, state, and even district level. Teachers and the colleges that train them are not. Good research is a way to get back into the conversation.

I know this is a long post but I want to look at one more study: Using word study instruction with developmental college students. I think this is kind of the best-case scenario for small sample education research. In this study Atkinson et al. look at a single intervention, a word study curriculum, and assess it quantitatively and qualitatively. They use two classes with one acting as the control group. The idea is you set up a pretest, apply the treatment (word study curriculum) to the experimental group over several weeks, and then give the post test. It's not a pure experiment as that would require the study to be blinded and ideally to have multiple control and treatment groups. But, as I've noted before, it is difficult to get fully experimental models in the social sciences so quasi-experimental are more common.

The word study curriculum is something I'm very familiar with because that's the curriculum I was using in my 9th and 10th grade special education classes. The ideas is that you take students who are far behind in their literacy development and give them a crash course in linguistics. I know that sounds odd as linguistics has a very ivory-tower feel to it but it's not as complex as you might think. By breaking the language down to simple parts and showing students how they fit together to make sounds, words, and meaning, you can help them think explicitly about what they're reading and writing. From there you can work to complex vocabulary, syntax, or any number of other areas.

Atkinson et al. take five weeks to do comprehensive word study with one remedial college class. They only look at a single measure but that measure shows significant growth in the experimental group when compared with the control group. They also survey the students and apply qualitative analysis to determine how the students felt about the intervention. Lastly, Atkinson et al. end with a call for further research and indicate they are conducting a more longitudinal study across more classes.

While not without flaws (possible treatment bias because the study is not double blinded, small sample size, questions of sample selection, instrument bias), this study does what I think good education research should do. It makes the case that following a particular strategy improves student learning and outcomes. It measures the students to accomplish this but it also features some qualitative aspects to help the research feel grounded. It participates in a discussion of the issues facing education, offers evidence of a solution, and pushes for further study. That's something that all education research should endeavour to accomplish.

Why doesn't it?

I don't have a super firm answer to that yet but let me propose a few I've come up with.

  1. Education is heavily siloed and researchers tend to publish in ways which their particular field has always published. English and Literacy seem to promote "action research" and qualitative or observational studies. I should look at the math education literature to see if they are more quantitative. I will also note that I'm quickly learning how much elite institutions like to toot their own horn. For example, the first two articles are from (in part) CTC alumni and the author of the second is actively involved in my Literacy MA program. Public Relations seems to be an activity that every faculty member and some students are expected to perform. Research may follow this pattern. 
  2. Education, in general, dislikes quantitative research. Maybe this is a response to NCLB/RTTT and the current incarnation of the reform movement? Standardized testing is widely misused by states and districts and is a tool used by politicians to break the political power of teachers unions. This has bred distrust of any quantitative measures. In turn, any data which relies on high-stakes testing is seen as illegitimate by many educators whether the use of that data is accurate/valid or not. There are elements of social justice here because testing is seen as a proxy for race, socioeconomic status, and other factors. Any quantitative measure is potentially racist, sexist, classist, or otherwise heavily biased. 
  3. There are vastly different worldviews on education. Some people don't take a problem-solution view of schooling. They don't look at a kid who can't read or write and see that as something which needs to be addressed. Instead they see a kid who might have other ways of learning the concepts meant to be taught. They see a kid who has a story and intrinsic value regardless of their literacy. So while I lament that there's only a focus on having rich experiences, they see those experiences as an end in themselves. 
  4. Education doesn't lend itself to "good" research. While it's easy to scrape a ton of testing data from all over the country, there isn't a good way to design and deliver experiments. There are ethical concerns, for example, about setting kids back in their learning. If a treatment proves disastrous, is a whole class of kids going to have to repeat a grade? If the intervention is so much better than the status quo, shouldn't we end the experiment and apply the intervention across the board?
  5. There is more quantitative research in education that I realize. This is probably the most likely. I am doing a big literature review as a course final project so I expect to have more perspective on this in a few weeks. 

There may be more but that's all I could come up with this morning. I don't want to weigh too heavily on one side of the other, though, because I think I suffer many biases of my own. The application of quantitative research to education has proved highly problematic in the last two decades and my love of good testing stems from the low-stakes environment of my classroom. What I care about is meaningful achievement from students, not some state exam. But I worry that without more quantitative measures coming from colleges of education, educators will be shut out of the national conversation about schools.

Here's my nightmare scenario: The world of education is going to replace teachers with centrally planned, algorithmically delivered instruction. Kids are going to show up to school for free lunch and spend 8 hours staring at a screen which pretests a set of skills, queues up the lessons that the kids score lowest on, and then post-tests those lessons to see if the kid got enough questions correct before moving on to the next. Teachers are going to be unnecessary in this world because the labor of teaching is being removed. No teacher will have to plan a lesson ever again. No teacher will have to grade an assignment ever again. Teachers will simply sit at their desks and help the kids troubleshoot the educational software or help them interpret some instructions.

This is the world of pure constructivism. If all that matters is that students are given access to the raw materials of learning, then teachers can and will be replaced. Looking back to Long & Grove and Santori, we see that same constructivism at work. As the teacher, you should step away, let the kids discuss the carefully curated materials you've brought them, and the learning will happen. That there is a teacher present in the room seems superfluous. Indeed, the teacher could have been in an office six years ago designing the curriculum and the adult in the room is just a babysitter. Pure constructivism is a world in which every student is an autodidact.

If you don't believe me, take a look at this.
DreamBox Learning tracks a student’s every click, correct answer, hesitation and error — collecting about 50,000 data points per student per hour — and uses those details to adjust the math lessons it shows. And it uses data to help teachers pinpoint which math concepts students may be struggling with.
Mr. Hastings [CEO of Netflix] described DreamBox as a tool teachers could use to gain greater insights into their students, much the way that physicians use medical scans to treat individual patients. “A doctor without an X-ray machine is not as good a doctor,” Mr. Hastings said.
And how's that X-ray machine going to help diagnose diabetes, liver disease, heart disease, neurological disorders, or any number of disease unrelated to bones? OB/GYNs rarely use X-rays because, you know, irradiating fetuses isn't great. Should we discuss the role overuse of medical imaging plays in inflating healthcare costs? No, let's not pick at the metaphor. I have doom and gloom to discuss. 

Reed Hastings and other Silicon Valley types are actively pursuing the constructivist vision of education. They're gathering large amounts of data and performing sophisticated analysis of that data to drive further development of their products. Moreover, their money and resources are sorely needed by cash-strapped schools. Policymakers aren't going to be listening to Santori's observational study of 5 third grade students. They're going to listen to Hastings' study of thousands of students with hundreds of thousands of points of data. Whether that's the best approach or not, the numbers are more persuasive. So as the kids plug into DreamBox for their math lessons, what's their teacher doing? Is he grading? Why? DreamBox does the grades. Clearly he isn't designing lessons because those are delivered by DreamBox's algorithms. The teacher who made those lessons is miles away in an office. So why have a classroom teacher at all? 

If we can't prove why, Reed Hastings and others like him will prove why not