It can be really complicated to judge the quality of research studies. You can get deep into concerns about how a survey question was worded, how the timing of the evaluation or repetition of testing affected the responses people gave, how appropriate the data management and statistical analysis methods used were, and other nuanced, difficult questions. These questions are important, and are often addressed in reviews of multiple studies, but they’re not necessarily the kinds of questions I ask the first time I read a study.
And even reviews of studies are looking for certain components in studies that don’t necessarily have to do with the quality of the research. The CDC’s review of sexual violence prevention was looking for programs that were rigorously evaluated and demonstrated effectiveness in reducing violent behavior. In the whole fields of sexual and intimate partner violence prevention, only three programs and a federal policy fit these criteria, and we don’t even know exactly which aspects of those programs reduced violent behavior. And of course, these three programs are not necessarily appropriate for all communities.
We have a lot to learn from studies that don’t fit the criteria of the CDC’s review, or of other reviews. For instance, we can learn about risk and protective factors that are important for certain communities, strategies that are associated with reducing risk factors of perpetration, or interesting approaches or underlying principles for prevention programming. To do that, we need to be able to recognize which studies are stronger and which are weaker – which we should trust more, and which we should take with a grain (or a handful) of salt. It’s good to be able to read a study and get a general feel for whether you trust its conclusions or not. That’s what I do when I look through research studies and decide what to write about and what to say on PreventConnect.
I want to share what it is that I’m looking for when I read through a study. These are of the more important questions that I ask myself to decide how much I trust the study’s conclusions.
- How did they choose which people in the target population would be participants? Did they get an enrolment roster at a college and email a random selection of students? Or are they a professor who only did the program with their class? The first case is definitely better if there is a large enough sample and if the program participants are meant to be representative of the broader campus population. But the second case could be useful, too—maybe that professor could have done in-depth, qualitative interviews with those students that reveal a lot of nuance about their attitudes, barriers to performing the desired behaviors, what worked and what didn’t about a program, etc.—it’s just not likely to represent the whole student body.
- How are the program participants similar and different to my community? What are their ages? Linguistic background? Gender makeup? Educational status? Race? Do they live in urban or rural areas? Income level? Community values? Could a program done with middle school students also be done with high school students? Maybe, but maybe not. Maybe a program done in San Francisco could work in Dallas, Texas. But maybe not. Importantly, this question isn’t just about how different or similar the program population is from my community, but about how these differences matter for implementing the program. Many times I’ve read an abstract that seems interesting, only to find out that the study was done with 12 adolescents in Estonia or somewhere else I’m equally unfamiliar with. And it could be that the study findings are relevant in some ways to parts of the US, but I just don’t know enough about the similarities and differences to be confident one way or the other.
- What are the research questions asking? What are they not asking? Did the study ask about actual violent behaviors or behavioral intentions, or just knowledge, attitudes, and skills? Which risk and protective factors did the study address, and which did it leave out? How does that match up with the risk and protective factors that are important in my community, or with the goals and objectives of my program? Do the research questions fit my philosophy and understanding of the problems of sexual and/or intimate partner violence?
- If I’m trying to determine whether I trust a claim that a program was effective at creating some change…
- How many of the chosen/invited participants actually completed the program? If only 40% of the people invited participated, that can be a problem. What if the people who participated are those who are already interested in ending sexual and intimate partner violence? Their results from the program are probably going to be different than the results would have been for the other 60% of invited people. Also, if there was a comparison or control group, did a very different number of people drop out of the program group than the control group? Again, if a lot of people dropped out of the program group, I’m less likely to trust that the program was effective, even if it says it was in comparison to the control group. This is because if we had been able to measure change in the people who dropped out, we might find that they didn’t change. So considering everyone who did some part of the program, it could very well turn out that the program really didn’t make changes any differently than the control.
- What was the program compared to? Maybe there was no comparison or control group. If not, how do I know that any changes that occurred happened because of the program, and not because of some big local news story? (This might not be a problem for me. If I’m looking for innovative program activities or ways to adapt a program to a certain cultural context, I might be more interested in the content of the program than I am in being sure that the program caused certain changes.) If there is a comparison or control group, what activities did they do? Did they get a generic health education class, or did they just do the study without getting any sort of program? Are there different groups that got different combinations of program components (like Shifting Boundaries, where one group got a curriculum and a building intervention, one group got just a building intervention, one group got just a curriculum, and one group got nothing). Having a comparison or control group gives me more confidence that a program that says it was effective actually was.
- How was it decided which participants got which program components? This is getting nit-picky, but it is stronger if participants are randomly assigned to get the program or a control, rather than assigned based on logistics or other factors. But the reality is that most programs are given based on logistics, and it can get very expensive to have random assignment to different program groups, so I go easy on this one.
- How long after the program were post-tests administered (if there were pre- and post-tests)? The longer, the better. It’s not as important to me that a participant can answer a question correctly right after they participate in a class. I’m usually more interested in whether the knowledge, attitude, intentions, etc. stick with them over time.
- If I’m trying to learn in-depth about program implementation, participants’ experience of a program, underlying principles for prevention, etc. in a qualitative (as opposed to quantitative) way…
- What are the researcher’s probable biases? Any good researcher will try to identify their biases and avoid letting them define their interpretation. But it’s good to think about it and check. Is this researcher trying to promote a certain approach over others? If so, how might this affect the way they interpret what the participants say?
- What is the researcher not saying? Qualitative studies almost always have a huge amount of data that researchers have to sift through and pick out the most profound, important pieces. What kinds of things does it seem like they might have glossed over?
- Do you agree with the researcher’s interpretation of the data? Ten different people can listen to the same set of interviews and draw ten different conclusions. You probably will only be able to see a little bit of the raw data, if any, but does what you see support what the researcher has claimed?