Fulton County Schools Follows the Evidence

SDP Fellow Moira JohnsonHow SDP Agency Fellow Moira Johnson helped leaders in one Georgia district make an evidence-based decision between competing literacy programs.

In 2017, Fulton County School District in Georgia set a goal to increase literacy in third graders. To achieve this goal, district leaders wanted to implement a new district-wide, cost-effective professional development program in phonics that would provide results for students in grades K-2. Thus, during the 2018-19 school year, the district launched a pilot program in 10 schools to compare two competing programs over a three-year period. Piloting was itself relatively unprecedented in the district, and the program would mark yet another shift in the leaders’ thinking about data by including rigorous, quantitative evaluation.

“At the time, the district had done a number of qualitative perception studies with surveys and focus groups, but we were really trying to expand into more advanced statistical modeling and quasi-experimental design,” said SDP Fellow Moira Johnson. Johnson, one of two new quantitative analysts hired by the district to fill this evaluative need, found a unique opportunity for empirical evaluation in both the SDP Fellowship and the district’s phonics pilot.

“We had access to data such as pre and post test scores and program implementation costs. The hope was that the project would provide leaders the empirical evidence they needed for interim decision-making regarding pilot program participants and serve as an example for future rigorous analyses.” Thus, in year two of the pilot program, Johnson conducted one of the district’s first quantitative program evaluations, interim analyses of both programs in the study, and an accompanying cost-effectiveness analysis.

ROI tiebreaker

Johnson used the statistical technique differences-in-differences with phonics assessment data and found that programs were similarly effective midway through the pilot. “There were no significant differences in student growth from the beginning of 1st grade to mid-year in 2nd grade,” Johnson offered, “though school-level data showed that one of the programs (Program A) did produce an increase of student ability over time, that increase was only in two of the five schools participating.” From a purely academic standpoint, the two programs could be considered equally successful, but Johson was able to then look at program costs to uncover more insights for the district’s leaders in curriculum and instruction.

“We were really lucky to have the pricing data for this project,” reflected Johnson. Based on this added financial information, one of the programs emerged as the clear leader, costing half as much as the other program while providing a similar impact on student improvement. Further cementing its lead, the less expensive program (Program B) showed consistent results in all five of the schools that implemented it, whereas improvements resulting from Program A were concentrated in only a few schools. “This cost component is critical when the project is viewed in a greater context of budget constraints, particularly in the wake of COVID. This helped leaders choose the program that would lead to similar academic gains at much lower cost per student.”

Education’s evaluative middle ground

While the education sector has become more comfortable using descriptive and predictive data to make decisions, programs too often move straight to a full scale rollout without being tested. Strategic projects like Johnson’s provide a model for a shift toward the use of advanced statistical modeling and quasi-experimental designs so that districts can make important—and sometimes expensive—decisions based on more rigorous insights.

“When it comes to piloting, there are often concerns about equity,” said Johnson. “People want to give all students access to great programs, and there’s a perception that students might be disadvantaged if not exposed to a great program in the pilot stage.” This notion of inequity also explains why full-scale randomization isn’t often desirable in an education setting. “These are real students, and there are implications for making choices that may not benefit groups equally. Students can’t go back and re-do their education after the fact.”

Johnson’s insights further advance the case for middle-ground approaches that build in quantitative elements while keeping an ear to more qualitative perceptions of implementers on the ground. “If people are unwilling or unable to implement a program with fidelity, for example, or if they like or dislike a certain program for any number of reasons, we have still have to take that into consideration. We have to account for both sides of the equation--the numbers and the people.”

Since buy-in is a regular necessity at all levels of program implementation, Johnson learned the value of communicating complicated empirical findings to non-technical stakeholders. “Decision-makers needed short and actionable findings presented via executive summaries and slide decks to enable optimal decision-making. Providing short summaries of the major implications gleaned from statistical models helps decision-makers act based on analytical findings, and I learned this is better than presenting findings within complex tables and model equations.”

Considerations for conducting quantitative program evaluation

If you are contemplating evaluating the effectiveness of instructional programs, consider the following questions:

  1. What environmental criteria lead to the optimal implementation of the program? The results of Johnson’s analysis showed that student growth may vary across schools participating in the same program. Conducting a qualitative implementation fidelity or perception study in addition to a quantitative impact evaluation could shed further light on a program’s viability.
  2. How can research departments balance the need for long-term rigorous studies with senior leadership’s desire for speedy results? In Johnson’s case, interim progress reports using non-technical language balanced both needs and boosted enthusiasm for the long-term project at hand.
  3. Line item implementation costs are not always as readily available for other programs as they were in this case. How can your organizations’ accounting and research divisions work together to collect and monitor data on program costs for incorporation into future evaluations?