Clearing up the Clearinghouse: Making “what works” more comprehensive and accessible

Atsuko MurogaThe pandemic’s devastating impacts on student learning have ushered in new urgency to use research-based strategies to support students at every stage of their education. Yet, when a teacher or education leader needs to know which interventions to target, how can they sort through the massive amounts of research to decide, for example, which reading activities to implement, what classroom management strategies to try, or what graduation interventions to introduce? Millions of education-related studies have been published in the past five decades, but not all the research points to what works with the same level of certainty. Further, this explosion of research has not necessarily found its way into the hands of education decisionmakers.

The What Works Clearinghouse (WWC), an initiative of the Institute of Education Sciences (IES) in the U.S. Department of Education, aims to solve this problem. This online database of research evidence on education programs and practices from pre-school through adult education allows users to search for interventions to learn about their effectiveness. For instance, if a district leader wonders which strategies have been effective in improving students’ reading skills in elementary classrooms, the WWC repository can provide a list of relevant interventions, their impact on student outcomes, details about the rigor of the research evidence, and recommendations for replication.

However, in the midst of the education research boom, the WWC has continued to grapple with critical question: Are certain topics or populations being studied in the field, yet not making it into the clearinghouse at similar rates? Is the data in the clearinghouse labeled well enough to find its way into the hands of practitioners? Where does the evidence in the WWC come from, and how can the WWC’s human reviewers keep up with the massive influx of research released each year?

To answer these questions, the IES’s Knowledge Use (KU) division worked with Strategic Data Project (SDP) Federal Postdoctoral Fellow Atsuko Muroga. From March 2022 to May 2023, Muroga leveraged her data expertise to help the WWC continue to fulfill its mission of making research accessible to nationwide education practitioners.

Identifying areas where the WWC lags behind the field

Since 2002, the WWC has grown through extensive human effort. Review teams conduct literature reviews about topics in education, then read individual studies to determine the rigor of their research designs and whether they should go through a full review process. Given the nature of this process, IES’s team wondered: was research evidence in any particular topic area, study population, or geographical area lagging behind in getting reviewed?

To address this question, Muroga performed an exploratory analysis of the WWC repository’s underlying data on which studies were reviewed over the past 20 years, providing key statistics of the topic areas (e.g., literacy, math, teacher effectiveness), sample characteristics (e.g., ELL, racial minority, students with disabilities), geographical locations (e.g., urban, rural, suburban), and grade levels (e.g., pre-K, elementary, secondary) covered in the review process, sorted by review outcomes.[1]

Muroga found that research on literacy, math, and students with disabilities (SWD) is featured heavily in the WWC. The clearinghouse also covers all grade levels comprehensively. However, her analysis suggests there is room to greatly expand review of other subjects and topic areas, such as English Language Learners (ELL) and teachers and school leaders. There is also a large disparity in the geographical focus of study: studies that used samples from large urban districts were reviewed far more often than studies that used samples from small, rural districts, creating a major informational disadvantage. Rural practitioners may not be able to leverage the WWC to solve their problems or decide whether evidence related to existing interventions will translate in their very different contexts.[2]

The code Muroga used for this initial analysis can be used to monitor the clearinghouse’s coverage on an ongoing basis and ensure that all topics, populations, and areas studied by researchers are being reviewed and added. Ensuring the WWC is up-to-date may also allow researchers to identify understudied topics for which we need greater insight.

The Missing Data Problem

In performing these analyses, Muroga flagged a problem: more than 20% of the studies that met the WWC’s scientific standards, and were therefore listed in the online repository, had no assigned “topic tags,” making it impossible for users to find and connect them with the clearinghouse’s twelve thematic domains.[3]As a result, some of the evidence that had been reviewed and added to the clearinghouse was not finding its way into the hands of the users for whom it was intended.

More than 20% of the studies in the online repository had no assigned “topic tags,” making it more difficult for users to find and connect them with the clearinghouse’s twelve thematic domains.

As the WWC website and interface was undergoing a redesign, Muroga created a process and code for filling missing topic tags based on other existing data, using advanced Natural Language Processing (NLP) techniques.

Additionally, aware of the need to improve data quality, the WWC made “relevant topic area” a mandatory field for reviews of studies, ensuring that all new reviews of studies in the database are tagged under at least one topic.

Muroga is currently completing work that compares the human-classified topic tags and machine-classified topic tags. Her work will inform the methods used to address the missing topic tags into the future.

Keeping up with the Field

Keeping up-to-date with the volume, speed, and trend of reviewing and synthesizing education research is challenging. The existing process entails a detailed time- and labor- intensive formal literature review. Historically, this review was conducted one specific theme at a time. However, in the near future, the WWC plans to create hubs on broad topic areas where studies within those areas can be reviewed on a rolling basis.

Given the anticipated need to frequently monitor and review content, IES’s team sought greater insight into key questions. Where did the studies being reviewed and entered into the clearinghouse originate? What publications and journals should they keep an eye on? Which topics in education research are being increasingly or decreasingly studied? Has there been a fieldwide shift in methodological focus (i.e., an increase in evaluation of the effectiveness of education programs, policies, and practices)?

To help answer these questions, Muroga began several analyses. Looking at sources over the past twenty years, she found that journal articles and publications by research institutes have played an important part in establishing the clearinghouse, as compared to other sources, such as theses, dissertations, and conference papers.

Separately, with the help of SDP Faculty Advisors and Professors of the Harvard Graduate School of Education Sebastian Munoz-Najar Galvez (Bluhm Family Assistant Professor of Data Science and Education) and Luke Miratrix (Associate Professor of Education), Muroga is probing one of the largest databases of education research, the IES’s Education Resources Information Center (ERIC), which hosts over 2 million records of education research papers, reports, and other written materials.

Using underlying data from ERIC online library, Muroga has leveraged an NLP method called Latent Dirichlet Allocation (LDA) to explore  trends in published education research. One trend that became clear: since the early 2000s, around the time when the IES was established, the amount of causal research in education has grown considerably.[4]

The need for evidence-based decision making in education is as important as ever. As the federal government endeavors to make what works in education more accessible, Muroga’s work will help the KU team monitor emerging topics on an ongoing basis and increase the visibility of high-quality research in the WWC evidence base.

* A GitHub repository that aims to help other researchers leverage the ERIC data is underway. If you would like access to the GitHub repository, sign up for SDP notifications here.

SDP is grateful to the Bill and Melinda Gates Foundation for enabling the improved use of federal data to improve public education, as well as the Institute for Education Sciences for the collaboration and support of the SDP fellow.

 

 

[1] Fully reviewed studies receive one of the following  ratings: (1) meets the WWC standards without reservation (i.e., highest rating), (2) meets the WWC standards with reservation (i.e., second highest rating), and (3) does not meet the standards.

[2] The Center for Education Policy Research’s rural research center, the National Center for Rural Research Networks (NCRERN), was founded to partner with rural districts to fill this evidence gap.

[3] The WWC has since consolidated these categories, but at the time of Muroga’s analyses, these were: literacy, mathematics, science, behavior, children and youth with disabilities, English language learners, teacher excellence, charter schools, early childhood education, path to graduation, and postsecondary education.

[4] A limitation of this analysis stems from the fact that researchers are required to submit manuscripts and reports to ERIC as a condition of funding; the way in which research papers are collected for the database may lead to a bias in favor of causal methods.