
The Association Between StandardsBased Grading and Standardized Test Scores in a High School Reform Modelby Marty Pollio & Craig Hochbein  2015 Background/Context: From two decades of research on the grading practices of teachers in secondary schools, researchers discovered that teachers evaluated students on numerous factors that do not validly assess a student’s achievement level in a specific content area. These consistent findings suggested that traditional grading practices evolved to meet the variety of educational stakeholder expectations for schools, teachers, and students. Purpose/Objective: The purpose of this study was to examine the role of standardsbased grading in a high school reform by assessing the relationships between differing grading approaches and standardized test achievement. Setting: The study examined student performance from 11 high schools operating in a large metropolitan school district. Population/Participants: The sample of students included two cohorts of 1,163 and 1,256 11th grade students who completed an Algebra 2 course and the state standardized test. Intervention/Program: Each of the high schools implemented a locally designed reform known as Project Proficiency. A key component of the reform included utilizing standardsbased grading to assess student proficiency of the content. Research Design: This study utilized a nonequivalent control group design and quantitative analyses to compare the association between classroom grades and standardized test scores. Data Collection and Analysis: The data for the study included the students’ final grades, standardized test scores, and basic demographic information. Findings/Results: Results indicated that the rate of students earning an A or B in a course and passing the state test approximately doubled when utilizing standardsbased grading practices. In addition, results indicated that standardsbased grading practices identified more predictive and valid assessment of atrisk students’ attainment of subject knowledge. Conclusions/Recommendations: The article demonstrates the benefits of using standardsbased grading in reforms attempting to improve the academic performance of secondary schools, but also notes how restriction of grades to mastery of standards will challenge educators’ perception of their abilities and students’ efforts. The article also notes the methodological limitations of prior grading research and suggests the need for more robust studies assessing grading practices, student achievement, and school performance. INTRODUCTION Classroom grades play a critical role in the experience of secondary students in the United States. Although evidence has suggested that primary school grades provide important information about future academic achievement (Balfanz, Herzog, & MacIver, 2007), students’ grades in secondary school can have immediate and weighty consequences (Hiss & Franks, 2014). Honor societies, scholarship programs, and postsecondary institutions often utilize grades and grade point averages to inform admission and financial decisions. In addition, grades often serve as the gatekeeper for students’ participation in extracurricular activities. The National Collegiate Athletic Association (n.d.) even utilizes a sliding scale that incorporates grade point average and standardized college admission test scores to determine the eligibility of potential studentathletes. For students, an insufficient slate of grades could preclude participation, admission, or award. Although grades have served as a common and important measure for assessing students, grades have lacked a uniform or standard meaning. According to a wide array of research, secondary teachers relied on a variety of factors to determine students’ grades (Brookhart, 1993; Cross & Frary, 1999; Guskey, 2009; McMillan, 2001; Stiggins, Frisbie, & Griswold, 1989). For example, teachers utilized assessment of processes such as effort, behavior, class participation, homework completion, ability level, and growth (Brookhart, 1993; Cross & Frary, 1999; Guskey, 2009). Cizek, Fitzgerald, and Rachor (1996) observed, “It seems that classroom assessment practices may be a weak link in the drive toward improving American education” (p. 162). From both the importance and subjectivity of grades emerged a movement in secondary education to grade students solely on achievement in key academic standards within a curriculum (Guskey, 2009; Marzano, 2010). A shift to standardsbased grading requires deep and systematic changes to longstanding educational traditions. To facilitate these changes, reformers first need to demonstrate the benefits of standardsbased grading to educators. The purpose of this study was to examine the role of standardsbased grading in a high school reform by assessing the relationships between differing grading approaches and standardized test achievement. Specifically, this study was designed to answer the following research questions: 1. Does a stronger association exist between standardsbased grading and standardized test scores than with traditional grading practices? 2. Does a stronger association exist between standardsbased grading and minority or disadvantaged students’ standardized test scores than with traditional grading practices? BACKGROUND The concept of standardsbased grading entails associations with a broad array of research topics and policy debates, including but not limited to accountability, standardized testing, teaching practices, and common curricula. Conducting extensive reviews of the empirical investigations and theoretical discourse in these tangential areas would divert focus from the processes and outcomes of classroom grading practices. Furthermore, this study did not evaluate the policies that currently govern public education, but rather focused on the success of an initiative operating within those parameters. However, given the complex endeavors of teaching and learning, as well as the complex associations of standardsbased grading, we explicated our assumptions prior to reviewing the literature related to grading. ASSUMPTIONS OF SCHOOLS, TEACHERS, AND GRADES For decades, schools have performed a variety of functions. In addition to educating students about specific content knowledge, schools have offered numerous extracurricular activities, such as athletic teams, artistic performances, social clubs, and other nonacademic endeavors (Duke, 1995). To meet the needs of students from economically disadvantaged backgrounds, many schools participate in subsidized student meal programs (Harwell & LeBeau, 2010), with a growing number of schools providing additional wellbeing services, such as health, dental, and counseling clinics (Moore, 2014). Furthermore, schools have grappled with greater societal issues, such as gender and racial integration (Grant, 2009; Ogbu, 2003; Reese, 2005; Tyack & Hansot, 1990). Although schools have attempted to achieve an array of important and meaningful objectives, we assumed student attainment of subject knowledge, such as literacy or numeracy, as the primary responsibility of a school. Similarly, educational stakeholders, including taxpayers, parents, and administrators, have expected teachers to achieve multiple objectives. In addition to the content of core subjects like reading, writing, arithmetic, history, and the sciences, teachers have been expected to provide instruction in nonacademic skills. For instance, discussing report cards from New York City between 1920 and 1940, Cuban (1993) noted, “Space was provided for grades on effort, conduct, and personal habits” (p. 58). In the daily activity of schools and classrooms, teachers deliver instruction and guidance on critical skills like responsibility, creativity, resiliency, and others (Labaree, 2012). Although teachers often provide critical instruction in nonacademic skills, we assumed students’ mastery of subject knowledge as the primary responsibility of teachers. For schools and teachers, grades have operated as the primary form of communicating the performance of students to educational stakeholders. Despite this ubiquitous practice of educational communication, grades have not encompassed a standard meaning. Grades might communicate student growth or diligence to external stakeholders (McMillan & Nash, 2000), as well as serve as rewards or sanctions for students (Brookhart, 1993). Such practices often relay beneficial or meaningful information (Bowers, 2009), but do not necessarily communicate students’ academic achievement. To align with our expectations about the primary objectives of schools and teachers, we also assumed that grades validly and reliably represented students’ mastery of subject knowledge. Regardless of individual views on the purpose of schooling or responsibilities of teachers, federal and state governments currently hold schools and teachers accountable for student learning of specific standards. To demonstrate proficiency of these standards, students complete a battery of standardized state accountability assessments. One critical component of this educational reform may be the implementation of sound grading practices that directly measure student attainment of required standards. Without valid grading practices, students are likely to enter into their exam sessions with an invalid comprehension of their knowledge and abilities (Rosenbaum, 1997). Similarly, without valid grading practices, teachers might misjudge and therefore mismanage precious instructional time. To ensure a highquality education for all students, grading reform must become pervasive throughout secondary education in America. As Guskey (2009) states, If grades are to represent information about the adequacy of students’ performance with respect to clear learning standards, then the evidence used in determining grades must denote what students have learned and are able to do. To allow other factors to influence students’ grades or to maintain policies that detract from that purpose misrepresents students’ learning attainment. (p. 22) TRADITIONS OF GRADING To make systemic change within secondary education, measurement researchers stated that grades need to be based solely on levels of achievement within a class (Allen, 2005; Cross & Frary, 1999; Guskey, 2009). The vast majority of prior research on grading in secondary education indicated that most teachers do not focus grading on achievement. Brookhart (1991) initially described the grading process in secondary schools as a “hodgepodge of attitude, effort, and achievement” (p. 36). Most of these research studies involved surveying teachers on the various factors that they take into account when giving a student a grade in their class. For instance, Brookhart (1993) found that 84 surveyed teachers used the image of grades as currency to encourage student effort, participation, and appropriate behavior within the classroom. Cross and Frary (1999) further explored Brookhart’s findings of the variety of factors used in the grading of secondary students. On the basis of their survey of 307 teachers, the researchers confirmed teachers’ use of many nonachievement factors in grading students when they concluded: Because of the importance placed on academic grades at the secondary level, either for educational or occupational decisions, grades should communicate as objectively as possible the levels of educational attainment in the subject. To encourage anything less, in our opinion, is to distort the meaning of grades as measures of academic achievement, at a time when the need for clarity of meaning is greatest. (Cross & Frary, 1999, p. 56) McMillan and Nash (2000) further investigated the influences on teacher decisionmaking with respect to grading and the justification that teachers gave when assigning grades. The authors surveyed 700 teachers and then interviewed a sample of these teachers. From the teacher responses, the researchers identified various classroom factors involved in grading. Although achievement, as defined by student understanding, was one of the primary categories, several other categories emerged. Such categories included the teachers’ philosophy of teaching and learning, their desire to “pull for students,” their accommodations for individual differences among students, and finally student engagement and motivation. Supporting Brookhart’s (1991) assertions, McMillan and Nash (2000) concluded that teachers used grades as the main tool to encourage and monitor student engagement. Although teachers verbalized the need to measure student achievement through grading, “most teachers used a variety of assessments . . . including homework, quizzes, tests, performance assessments and participation” (McMillan & Nash, 2000, p. 26). To better understand and explore the various factors used in grading, McMillan (2001) surveyed 1,483 teachers and identified four distinct factors most often seen in secondary grading practices. These factors included academic achievement, external benchmarks, academic enablers, and extra credit. In addition, McMillan discovered that teachers assessed higherability students in a motivating and engaging environment by measuring higher cognitive skills, while the same teachers gave lowerability students more rote learning assessments, more extra credit, and less emphasis on academic achievement. Discovery of such differential grading suggested that grading practices in secondary schools maintained or possibly increased achievement gaps between student subgroups. Whereas teachers graded higherability students based upon achievement, they graded many atrisk students utilizing a wider range of factors. This wider range of factors potentially inflated students’ grades, making them less valid indicators of standards’ achievement, which subsequently obscured the students’ needs for additional instruction, practice, or remediation. VALIDITY OF GRADES The results of the survey research of secondary teachers’ grading practices exhibited that teachers used a variety of factors to grade students. Student achievement emerged as only one of the factors used by teachers to assess student work. Therefore, grades are not necessarily a valid measure of students’ level of achievement in secondary education. Despite this lack of validity, educators utilize grades to make critical decisions about students’ future, such as entry into elite clubs and organizations, access to scholarships, and admissions into college. If grades measure several factors, including a student’s ability to navigate the social processes of school, and not just academic achievement, the validity of grades becomes a major concern in American education. For grades to be a valid measure of student achievement, teachers must assess students on their achievement based on required curriculum standards. As a result of the variety of factors used by teachers to grade students, Marzano (2000) contended that in terms of measuring student achievement “grades are so imprecise that they are almost meaningless” (p. 1). Allen (2005) summarized the critical nature of ensuring validity in the grading process in measuring academic achievement: Also, since many of these factors such as effort, motivation, and student attitude are subjective measures made by a teacher, their inclusion in a grade related to academic achievement increases the chance for the grade to be biased or unreliable, and thus invalid. The purpose of an academic report is to communicate the level of academic achievement that a student has developed over a course of study. Therefore, the sole purpose of a grade on an academic report, if it is to be a valid source of information, is to communicate the academic achievement of a student. (p. 220) Guskey (2007) explored the perceived validity of teacher grades by surveying 314 educators in three different states. He asked educators to rank from 1 to 15 sources of evidence of student learning that “you trust to best show what students know and can do” (p. 21). The sources of evidence included standardized tests, various assessments, teacher observations, quizzes, homework completion, portfolio, students’ grades, class involvement, and behavior and attitude. Statistical analyses of the data indicated that the participants gave a relatively low ranking to grades being an accurate indicator of student learning. Guskey (2007) concluded that the educators’ low ranking of grades correlating with academic achievement resulted from both teachers’ and administrators’ recognition “that a variety of nonacademic factors, such as effort, attitude, participation, and class behavior, typically influence grades” (p. 22). Such factors also supported the discrepancies between student grades and standardized test scores (Allen, 2005). In another study on the validity of grades, Bowers (2009) explored the relationship between teacherassigned grades and standardized assessments. He found that schools used standardized test scores, in place of grades, to make datadriven decisions. Administrators have consistently sought to remediate and intervene for low performance on standardized tests, when student grades should also be used to inform these decisions. Conceding that grades are not a valid measure of a student’s academic achievement, Bowers (2009) suggested how schools could better use grades as a basis to provide critical safety nets to support student success: The hypothesis here is that rather than cast this hodgepodge nature of grades in pejorative light as data that is useful to schools because grades only moderately correlate to test scores, the theory presented here . . . points to the idea that grades appear to assess both academic knowledge . . . as well as a student’s ability to perform well at the social tasks of the schooling process, such as behavior, participation, and attendance. (p. 622) RELATIONSHIPS BETWEEN GRADES, TEST SCORES, AND STUDENTS Few empirical research studies have rigorously investigated the relationship between grades and standardized test scores. Little evidence exists on the impact of standardsbased grading on standardized test scores. Welsh, D’Agostino, and Kaniskan (2013) even stated, “However, as far as we know, the linkage between SBPR (standardsbased progress reports) and standardsbased assessment scores has not been explored in the academic literature” (p. 26). Yet, as a result of schools’ increased accountability for improving standardized test scores, several research studies have attempted to determine the relationship between grades and test scores. If recent increases in school accountability have led to changes in teacher grading practices, then an association should exist between grades and standardized test scores. If the use of standardsbased grading methods has led to a decrease in the use of a medley of factors to assess student learning, and grades were more of a valid indicator of student achievement, then a strong correlation should have existed between grades and test scores. Conley (2000) first examined the relationship between grades teachers give their students and proficiency scores given to the same students by external raters. Conley found little correlation between teachers’ grading system and student proficiency. He specifically noted that students judged proficient through an analysis of their work by external raters were not necessarily the students with high grades. “The stepwise regression analysis examines teacher grading systems and student proficiency scores and found very little relationship between the grading system a teacher used and whether or not a student was proficient” (Conley, 2000, p. 18). Conley surmised that the low correlation suggested that separate constructs besides standardsbased achievement were used in grading. Specifically, he noted that homework in mathematics classes and inclass assignments in English classes comprised a significant portion of a student’s grade, although these assignments might not measure proficiency on mandated standards. This relationship between test scores and grades was the topic of several research studies over the past decade. Lekholm and Cliffordson (2008) studied the grades of nearly 100,000 students from Sweden and their association with students’ scores on national tests. Although results from their analysis indicated that the greatest variance in grades came from actual achievement levels in the subject area, other factors outside of achievement influenced the grades given to students. One of the most significant findings of their research revealed that schools with students from lower socioeconomic levels assigned grades that were higher than the students’ standardized test scores. Therefore, the atrisk students in these schools evince a lower correlation between grades and test scores. Two other studies examined the correlation between grades and standardized tests, as well as the differences in the association between grades and test scores for minority, low socioeconomic, and nonminority students. Brennan, Kim, WenzGross, and Sipperstein (2001) and Haptonstall (2010) discovered modest correlations between teacherassigned grades and standardized state assessments. However, both studies found a lower correlation between grades and standardized test scores for minority students, English language learners, and low socioeconomic students than their counterparts. The findings suggested not only that grades did not strongly correlate with achievement scores on standardized tests, but that minority students and low socioeconomic students were possibly given higher grades than their achievement levels warranted. Together, the findings of Brennan et al. (2001), Haptonstall (2010), Lekholm and Cliffordson (2008), and McMillan (2001) supported a theory of grade inflation with minority and disadvantaged students. As a result of teachers including factors such as effort, behavior, and attendance, minority and disadvantaged students earned grades that overestimated academic attainment. According to Brennan et al. (2001) and Haptonstall (2010), the practice of grading atrisk students on factors other than achievement level supports the existence of a significant achievement gap between minority students and their White counterparts. Despite intense focus on the elimination of the achievement gap in American secondary schools, few education leaders have examined grading policies as a potential source of the problem. Based on two decades of research on the grading practices of teachers in secondary schools, researchers found that teachers evaluated students on numerous factors that do not validly assess a student’s achievement level in a specific content area. These consistent findings suggested that traditional grading practices evolved to meet the variety of educational stakeholder expectations for schools, teachers, and students. However, research data are limited on the correlation between grades and achievement scores on standardized tests at the secondary level. Even less research is available on the impact of standardsbased grading on the correlation between grades and achievement scores. To address these data and research deficits, this study examined the association between standardsbased grading and achievement as measured by standardized tests, for students in general and for minority and low socioeconomic students in particular. METHODOLOGY PROJECT PROFICIENCY In 2010, the Kentucky Department of Education (KDE) enacted Race to the Top legislation that forced the state to identify its lowestperforming schools and implement a federally approved turnaround model. Within Jefferson County Public Schools (JCPS), KDE identified 18 schools as persistently lowachieving (PLA) because of their low mathematics and reading scores on state accountability assessments. In an effort to improve reading and mathematics scores, JCPS educators implemented Project Proficiency (PP), a districtwide instructional program to standardize both mathematics and reading curricula and instruction at the secondary level (JCPS, 2011a). During the 2010–2012 school years, 11 JCPS secondary schools^{ }implemented PP to improve academic achievement. Although we limit our description of PP to aspects of the reform with direct influence on standardsbased grading, the Project Proficiency Guide (JCPS, 2011a) provides more detailed information about the specific design of the initiative. In addition, Baete and Hochbein (2014) and Burks and Hochbein (2013) provide supplemental information about the implementation of PP across the district and within the participating schools. A central strategy of PP, to guarantee student competency of subject knowledge, required mathematics instruction to focus on student attainment of key course standards. The curriculum identified three key standards within a grading period for high school mathematics courses. Through intentional dissection of key standards, measureable learning targets guided daily instruction across each of the classrooms and schools. The key standards and learning targets focused instruction and aligned curriculum for each classroom within the school and across each of the 11 district high schools. Through a unified process of instruction and interventions measured by districtdesigned common assessments, teachers attempted to guarantee competency of every student. Within the PP initiative, teachers graded students solely on their proficiency level for each of the key standards within the grading period. Students took both a diagnostic assessment in the middle of the grading period and a proficiency assessment at the end of the grading period. Each of these assessments accounted for 40% of the students’ final grade. Student reflection on their proficiency within each standard accounted for the final 20% (JCPS, 2011a). Finally, for students who do not reach proficiency, PP required schools to make accommodations to remediate after the grading period in order to meet the key standards and retake proficiency assessments. Teachers relied on grades to identify and implement interventions for students who did not demonstrate proficiency of key standards. All 11 high schools participating in PP implemented the initiative in every Algebra 1, Geometry, and Algebra 2 classroom. Within each of these mathematics classrooms, teachers attempted to ensure overall student competency by focusing on their proficiency in key standards for every student through a standardsbased grading approach. As a result, grades became the key indicator to identify which students needed interventions and the additional instructional support needed to reach competency in the key standards. The design and implementation of PP also fostered teacher collaboration to develop and implement successful instructional strategies, as well as interventions for students performing below competency benchmarks. PARTICIPANTS This study included participants from 11 high schools that implemented PP for the 2010–2011 school year. Educators implemented PP in an effort to improve the rate of proficient scores earned by students on the Kentucky Core Content Test (KCCT) in mathematics. All 11 high schools operated as part of JCPS in Louisville, Kentucky. As the 26thlargest school district in the nation, JCPS served 100,474 students in 150 schools. The demographic composition of the district consisted of 51% White, 37% Black, and 12% Other students. Nearly 62% of the student population qualified for the federal free/reduced lunch program (Table 1). From the high school population, we identified two separate cohort groups (Table 1). In Kentucky, students take KCCT assessments in mathematics (11Math) and science (11Science) during their junior year of high school. Students included in this study completed an Algebra 2 course during the 2010 or 2011 school years and had grade 11 KCCT results in mathematics (11Math) and science (11Science) during the same year. One cohort consisted of 11th grade students from the 11 high schools during the 2011 school year. Each of these students completed an Algebra 2 course and received PP within the Algebra 2 course. The second cohort consisted of 11th graders during the 2010 school year from the same 11 high schools, but who did not receive PP within their Algebra 2 course. As a result of testing and district reform, juniors from the 11 high schools experienced PP in math, but not in science. Therefore, we analyzed science scores as a nonequivalent control group (Shadish, Cook, & Campbell, 2002) to compare the effects of PP between the same students in two different courses. Table 1. Demographic Characteristics of Cohort Participants (N =2,419)
For purposes of this study, we defined Algebra 2 as a course in the Kentucky Program of Studies that met the Kentucky Algebra 2 graduation requirement. In JCPS and the 11 high schools, these courses included Algebra 2, Algebra 2 Honors, and Algebra 2 Advanced. Algebra 2 classrooms were identified in each of the study’s schools through an evaluation of each master schedule. The 2010 cohort sample contained 1,163 (n = 1,163) students across 11 high schools. The 2011 cohort sample contained 1,256 (n = 1,256) students across the same 11 high schools. The three utilized courses included students who qualified for special education services. However, students who did not complete an Algebra 2 course or who did not have KCCT results in mathematics and science during the same academic year were excluded from the study. MEASURES We obtained student data for each of the cohorts from the JCPS Data Warehouse. For security reasons, when Kentucky administered the mathematics and science tests, proctors distributed multiple versions of the tests. The KCCT Test Administration Guide identified average Chronbach’s Alpha measures of .89 and .84, respectively, for the six versions of the mathematics and science tests. Item and description indices were identified by the Kentucky Department of Education for each test version and converted to mean scale scores (MSS) from 0–80. Kentucky mean scale scores correlated to four performancelevel descriptors: novice, apprentice, proficient, and distinguished (Table 2). Table 2: Grade 11 Kentucky Core Content Test (KCCT) Mean Scale Score Range and Performance Descriptors
Students’ results on KCCT mathematics and science assessments served as outcome measures. We analyzed 11Math as the primary dependent variable, but utilized 11Science as an additional dependent variable to compare the effects of PP within the same treatment group. Specifically, the particular effect analyzed within this study was the association between standardsbased grades within PP and 11Math scores. The use of standardsbased grading in PP was only used with the 2011 cohort in Algebra 2 courses. In contrast, the 2010 cohort in mathematics and the 2011 cohort in science experienced a traditional grading approach, which was outlined in The Jefferson County Public Schools Student Progression, Promotion and Grading (SPP&G) (JCPS, 2011b) (Table 3). The SPP&G defined district policy concerning the components of an academic grade. The policy stated, “Academic grades must include a minimum of three of the following: portfolios, projects, discussion/problem solving, group work, classroom assignments, homework/journals/logs, quizzes, tests, participation, and teacher observation” (p. 8). Finally, district policy also mandated that “one component may not count for more than 40 percent of the total academic grade” (p. 8). Table 3: Classroom Letter Grade Values and Ranges
Note. Excerpted from JCPS “Student Progression, Promotion and Grading” (2011) The independent variable used for standardsbased grading was the implementation of PP. Instead of using the traditional grading method, PP assessed students based on the standardsbased grading approach. As a part of the standardsbased grading process in 2011, teachers required their Algebra 2 students to become proficient in three key standards for each sixweek grading period. As a result, grades for mathematics in the 2011 cohort were based solely on a student’s proficiency level in the three key standards for the six weeks. DESIGN AND ANALYSES TO EVALUATE GRADING A quasiexperimental nonequivalent control group design was implemented to analyze the association between standardsbased grades and 11Math scores during the 2011 school year. Comparison of the association between grades and KCCT scores relied on two control groups. The first control group consisted of students who completed an Algebra 2 course and received an 11Math score in 2010, but did not experience the standardsbased grading effects within PP. This control group provided a measure of association between two years in 11Math results. We also utilized a second control group, which provided a comparison of association between two groups of the same students. The second control group involved the same students as the treatment group. However, the students received standardsbased grading as a part of PP in mathematics, but not in science. Therefore, we also analyzed the association between their science grades and 11Science scores and compared results to the association found in mathematics. Inclusion of the nonequivalent control group, 11Science, reduced some of the most likely or greatest threats to validity. A crosssectional single comparison of 2010 and 2011 results would be susceptible to several validity threats, including historical, selection, and compensatory issues (Shadish et al., 2002). For example, one administrator suggested that any improvement in 11Math resulted because of teachers “getting fire in their bellies.” However, comparison of different students in the same subject, and the same students in different subjects, reduced the likelihood of such threats biasing results. Of course, not all threats could be eliminated, but as Shadish et al. (2002) stated, “Therefore, possible threats to validity are not always plausible ones” (p. 139). EVALUATION AND ANALYSES OF GRADES To analyze the influence of PP on the association between grades and test scores, we first tabulated basic descriptive statistics to determine the percentage of students in each cohort that scored proficient or above and received an A or a B in the corresponding content course. On the KCCT assessment, proficient or above is the level necessary for students to score in order for schools to avoid state and federal sanctions. Second, we evaluated students who received above average grades within the content course for each of the three cohorts. Students who received an A or a B in the specific content course were considered above average in standard attainment. An analysis of variance (ANOVA) determined whether students who experienced standardsbased grading and scored above average in their class scored higher on the corresponding KCCT assessment than students who experienced traditional grading. Third, we determined the correlation coefficient (r) for all students in each of the three groups. By analyzing the correlation coefficient for each of the groups, the researchers determined whether the treatment group that received standardsbased grading had a higher correlation to KCCT mathematics scores than the control groups that did not receive standardsbased grading. A coefficient of determination (r^{2}) was determined for each of the groups, as well as for both minority students and students receiving free/reduced lunch (FRL) aid. Finally, a regression analysis measured the association of grades and the corresponding KCCT score between the groups. (1) Prior achievement as measured by eighthgrade mean scale scores, (2) FRL status, and (3) grades within the specific content course were all included as control variables to create a robust, yet parsimonious model. RESULTS CROSS TABULATION OF GRADES AND TEST SCORES The researchers analyzed descriptive statistics of student grades and test scores to determine the percentage of students who received an above average grade in their mathematics or science course (A/B) and also scored proficient or distinguished on the corresponding KCCT test (Table 4). If students’ grades were a valid indicator of their learning subject content, then students who scored an A or a B in their content class should have scored proficient or above on the state accountability assessment. With the students who experienced traditional grading methods, in both mathematics and science, this assumption did not prove true. In the nonPP Math cohort, 466 students (40%) received an A or a B in their Algebra 2 class, yet only 26% of them scored a proficient or distinguished on the 2010 KCCT mathematics assessment. Within the PP Science group, 514 students (40%) received an A or a B in their science class, of which 28% scored a proficient or distinguished on the 2011 KCCT science assessment. Within two traditional grading cohorts, success in the classroom as defined by grades did not translate into success on the KCCT assessment. For students who experienced standardsbased grading in PP Math, 568 (45%) received an A or a B in their Algebra 2 class, with 55% of them scoring proficient or distinguished on the 2011 KCCT mathematics assessment. When teachers utilized standardsbased grading methods, not only did the number of As and Bs increase, but the rate of passing the state assessment among students who earned these grades approximately doubled as compared to the two traditional grading cohorts. Despite this increase in proficiency among students who experienced standardsbased grading, educators should be concerned that 45% of the students in the PP Math Cohort who achieved an A or a B in their Algebra 2 class still did not meet KCCT mathematics assessment proficiency. Table 4. KCCT Performance Description as a Function of Subject Grade
RELATIONSHIP BETWEEN GRADES AND TEST SCORES Further data analysis assessed the association between grades and test scores. A Pearson correlation coefficient was calculated for the relationship between participants’ grades and KCCT test scores in each of the three groups (Table 5). A weak but positive correlation was found in both PP Math and nonPP Math, although the magnitude of the PP Math was greater. Overall, the correlation results indicated that students with higher grades tended to score higher on the KCCT mathematics test. However, for PP Sciences little if any correlation between grades and test scores existed (Hinkle, Wiersma, & Jurs, 2003). Supplementary analysis revealed that correlations between grades and test scores among the minority and FRL student subgroups mirrored patterns found among the samples of all students. However, for each cohort the correlation magnitudes for minority students were consistently less than both the FRL and All samples. Table 5. Cohort Correlations Between Grades and Test Scores
Correlational analyses suggested that students who experienced standardsbased grading had both stronger correlations between grades and assessment scores and stronger coefficients of determination than students who experienced traditional grading. Minority and FRL students who experienced standardsbased grading in PP Math demonstrated greater correlations between grades and test scores as compared to the cohorts that used traditional grading methods. However, educators should again be concerned that regardless of subject or cohort, at best grades demonstrated weak positive correlations with achievement scores. Even with standardsbased grading in PP Math, the association between grades and test scores was weak. ANALYSIS OF VARIANCE IN CONTRAST OF GRADES Further analysis of the data explored the mean KCCT test scores between each grade for the three groups. A oneway ANOVA compared KCCT test results based upon the earned grades of participants in each cohort. This contrast examined if groupings by subject grade revealed differences in KCCT test scores. For PP Math, nonPP Math, and PP Science, results indicated statistically significant differences between grades and KCCT scores (Table 6). Although correlational analysis revealed weak positive correlations between grades and test scores, ANOVA results indicated that students with higher grades earned statistically significantly higher KCCT test scores. For example, students in PP Math who received an A (M = 47.31, SD = 13.77) scored higher on the KCCT assessment than students who received a B (M = 37.48, SD = 15.11). Further tests revealed that students continued to score lower on the KCCT assessment based on their specific grade. Table 6. Mean KCCT Scores by Classroom Grade
Beyond statistically significant differences, ANOVA results also identified several differences of practical and pragmatic importance. First, only the average KCCT score from students who earned an A in PP Math qualified for designation as proficient by KCCT standards. Second, on average students from PP Math who achieved an A or a B earned a KCCT mathematics score greater than the state mean of 37.00. In contrast, students from nonPP Math and PP Science who earned an A in their content course on average scored less than the state mean on the corresponding KCCT assessment. Third, mean student scores from the three groupings demonstrated that PP Math students not only achieved greater test scores per grade, but also greater distinction between grades. For instance, PP Math students who earned an A averaged nearly 25 points higher on the KCCT mathematics assessment than students who failed the course, yet students who achieved an A in PP Science averaged 12 points higher on the KCCT science assessment than those students who failed the course. Similar to the descriptive and correlation results, these analyses indicated a greater association between grades and KCCT scores in standardsbased grading than with traditional grading methods. LINEAR REGRESSION RESULTS Finally, a multiple linear regression estimated the variance accounted for in test scores utilizing several predictors, including grades. Prior academic achievement was the strongest predictor of KCCT achievement score in mathematics or science for 11th grade students (Table 7). Grades within the specific content course were also significant predictors of achievement on the corresponding KCCT assessment. Although the grade within the course was significant in all three groups, it was a stronger predictor among students who experienced standardsbased grading in PP Math than among students who experienced traditional grading in the nonPP groups. The PP Math regression model estimated the greatest standardized coefficient for grades (β = .25, p < .001), although the nonPP Math cohort (β = .22, p < .001) estimate was similar. Finally, unlike the nonPP Math and PP Science models, student FRL status was not a statistically significant predictor of test scores, above and beyond grades and prior achievement, in the PP Math model. Table 7. Factors Accounting for KCCT Test Scores
ANECDOTAL OBSERVATIONS Unfortunately, a variety of constraints related to time and resources precluded rigorous analyses using qualitative methods. However, one of the authors participated in PP as the principal of an implementing school. Although he did not systematically interview or conduct focus groups, he actively observed the influence of PP on teachers and students in his school. In addition, his interactions with other district principals provided information about the broad perceptions of PP from the 10 other implementing schools. As a principal who continued to utilize and expand the use of standardsbased grading, his observations entail some bias. Thus, we limited the anecdotal observations to two primary concerns of standardsbased grading: narrowed curriculum and teacher reactions. The identification and emphasis of key standards holds the potential to narrow curricula and instruction to just “teaching to the test.” However, hundreds of hours spent observing classrooms and interacting with teachers, students, and parents did not appear to substantiate this concern. Instead, teachers noted that the limited number of standards provided them with more planning and instructional time, in which they developed and delivered lessons superior to previous years. In addition, teachers positively noted that the identification of specific standards enabled them to focus on depth of content, instead of breadth. This time and focus likely contributed to teachers commenting that they felt their students gained a deeper understanding of the key standards when using PP and standardsbased grading. Moreover, parent, teacher, and student discussions about grades appeared more meaningful and thoughtful. Instead of debating the number of points a student should have been awarded, more and more of these conversations focused on how the student demonstrated proficiency in a specific standard. Although standardsbased grading enabled teachers to focus instruction and conversations on subject matter attainment, this emphasis on attainment also challenged them when assessing diligent, but underperforming students. Teachers struggled with the notion of assigning low grades to students they perceived as “good.” Teachers continued to fear that grades that did not match students’ efforts would discourage and deter such students. Interestingly, the converse of this logic did not occur. For students who were less diligent or presented classroom disruptions, but demonstrated proficiency of the key standards, teachers did not begrudge high grades for these students. This anecdotal evidence cannot conclusively refute or substantiate the quantitative results. Yet the insights gleaned from extensive work with educators and students experiencing standardsbased grading add some descriptive nuance to the findings. In answering our research questions we did not consider this anecdotal evidence. However, our considerations of implications for policy, practice, and research relied on our review of prior literature, the quantitative evidence, and the experiences of implementing PP. SYNTHESIS AND CONCLUSIONS Attributing the observed improvement in student achievement solely to the implementation of standardsbased grading practices would be an inaccurate interpretation of this study’s results. Standardsbased grading was one component of the comprehensive PP reform. As demonstrated by Baete and Hochbein (2014) and Burks and Hochbein (2013), the package of curricular and instructional changes resulting from PP, which included standardsbased grading, contributed to increases in student achievement. However, these prior studies did not specifically evaluate the influence of standardsbased grading practices as part of PP. The methodology of the present study isolated the association between student grades and standardized test scores to compare standardsbased with traditional grading practices. With regard to the first research question, a stronger association existed between course grades and standardized test scores among students who experienced standardsbased grading as opposed to students who experienced traditional grading methods. First, descriptive statistics found that more students who achieved an A or a B in their class scored proficient or above on state accountability testing when they experienced standardsbased grading as opposed to traditional grading. Second, the magnitude of the Pearson correlation coefficient between grades and test scores was stronger for the PP Math cohort. Third, the analysis of variance found that students who achieved higher grades in their mathematics class also achieved higher scores on the KCCT assessment when they experienced standardsbased grading. Finally, regression estimates indicated that grades in standardsbased as compared to traditional grading accounted for a greater amount of variance in student test scores. In terms of the second research question, there was a stronger association between grades and test scores of minority or disadvantaged students in standardsbased grading. The correlation coefficients of the standardsbased grading cohort for minority and FRL students exceeded those of the traditional grading models. Statistics in Table 5 show that, among FRL students, the proportion of variance in the relationship between grades and test scores was over twice as high for PP Math students (19%) who experienced standardsbased grading as compared to PP Science students (7%) who experienced traditional grading. The exact same atrisk students had higher correlation statistics when evaluated on standardsbased grading as opposed to traditional grading. In addition, FRL status was not a statistically significant predictor in the PP Math model, but was statistically significant in the nonPP Math and PP Science models. This difference in models suggests that standardsbased grading weakened the negative association between socioeconomic status and student achievement. Together, these data support prior research, which suggested that teachers grade minority students less on achievement levels and more on a variety of additional factors (Brennan et al., 2001; Haptonstall, 2010). Although the consistency of the findings supports the benefits of standardsbased grading over traditional models, two limitations require discussion and potentially temper conclusions. First, the level of implementation of standardsbased grading within each school and classroom was not explored in this study. Teachers in each of the 11 schools implemented PP within their mathematics classrooms, and this implementation required a certain level of fidelity with standardsbased grading. The tenets of PP required that teachers “guarantee competency” of each of their students on three key standards for each sixweek grading period. Without implementing a standardsbased grading approach, teachers could not ensure that each student had met the three key standards. Schools and classrooms, however, could vary in their level of implementation with standardsbased grading. This study did not take into account the level of fidelity of implementation with standardsbased grading. Although science classes were used as a comparison group to measure the differences in a traditional grading approach and standards based grading approach with the exact same students, teachers most likely varied in their fidelity of implementation of PP and specifically standardsbased grading. As implementation data was not available, this research study could not account for these differences. A second limitation of the study was the lack of correspondence between KCCT assessments and the tested course. The KCCT assessments in mathematics and science assess content over three courses throughout a student’s high school career. In mathematics, Algebra 1, Geometry, and Algebra 2 contents are part of the KCCT mathematics assessment. Within this study, the grades students received on a standardsbased grading approach only evaluate students on Algebra 2 content. Therefore, a student could have successfully mastered the content in Algebra 2, but the student’s standardized assessment score could have suffered because of a deficiency in a previous mathematics class. In order to truly assess the association between grades and test scores, the standardized assessment should only cover content taught in that specific course. Despite these limitations, the results of this research study indicated that the use of standardsbased grading with PP classrooms increased the association between grades and standardized test scores among students within the 11 high schools that implemented the program. Students who were more successful in the content class that used standardsbased grading were more likely to score proficient on the KCCT assessment than students evaluated on traditional grading practices. The most significant finding to refute traditional grading methods derived from the 75% of students who received above average traditional grades in their specific content class, yet scored below proficient on the corresponding KCCT assessment. When evaluated by standardsbased grading, nearly twice as many students scored proficient when successful in their core content class. These findings provided strong evidence to suggest that standardsbased grading approaches should be central to an educational reform movement. IMPLICATIONS FOR POLICY, PRACTICE, AND FUTURE RESEARCH In an age of increased accountability and highstakes testing, the implications for practitioners are important to consider. Educational stakeholders expect schools and educators to accomplish a multitude of tasks beyond teaching subject content (Labaree, 2012). Although critics question the validity of many accountability measures (Downey, von Hippel, & Hughes, 2008), current federal and state policies hold schools accountable for every student’s proficiency in core content areas. In schools identified as persistently lowachieving, educators face relocation or termination of their post (Hochbein, Mitchell, & Pollio, 2013). Even in schools not identified as persistently lowachieving, state and district initiatives have measured and evaluated the added value of teachers (Guarino, Reckase, & Wooldridge, 2012), as well as rewarded or sanctioned teachers based upon their students’ performance on state accountability assessments (Podgursky & Springer, 2007). For schools and educators to ensure the learning of state standards by all students, our evidence would suggest that a standardsbased grading approach may offer a more valid method than traditional grading practices. This research demonstrated that standardsbased grading demonstrated a stronger association with state accountability test results than traditional grading practices. Furthermore, results indicated that students who performed above average in their class, and were evaluated with a standardsbased grading approach as part of the PP reform, performed higher on state accountability assessments than similar students assessed by traditional grading. The results support the reasoning that the grades students receive in a core content class using standardsbased grading actually reflect what the students know and can demonstrate on state proficiency assessments. This suggests that grades in a standardsbased assessment system more validly reflect student learning (Allen, 2005). Policymakers have worked to legislate a reduction in the achievement gap. This gap between minority and nonminority students and atrisk and notatrisk students has been widely discussed. Little, if any, of the focus to reduce the achievement gap has centered on changing grading practices (Welsh et al., 2013). Our evidence from this study suggests that student performance on standardized tests is associated with the use of standardsbased grading. As part of PP, nearly twice as many students scored proficient on the mathematics assessment when they experienced standardsbased grading in their Algebra 2 class. Furthermore, correlations between the grades and standardized test scores of minority and disadvantaged students were greater in standardsbased grading classrooms than in traditional grading classrooms. As suggested by prior researchers, standardsbased grading practices might be a necessary, but insufficient initiative to reduce the achievement gap in American education (Brennan et al., 2001; Haptonstall, 2010; Lekholm & Cliffordson, 2008; McMillan, 2001). Most importantly, both practitioners and policymakers must grapple with ways to deal with the diligent student who is unable to master the key standards to attain a passing grade. With traditional grading systems, a student who is compliant with a teacher’s policies and requests, completes all assigned tasks in a timely manner, and has a good attendance and behavior record almost always passes a core content class. In contrast, with standardsbased grading, additional factors have little influence on a student’s grade. If teachers grade solely on standard attainment, then the student who does not attain the standard must be given a failing grade. Standardsbased grading highlights the issue of the diligent and failing student, yet implementation of standardsbased grading policies also offers some beneficial resolutions. Schools might return to policies that reported multiple grades (Cuban, 1993). Grades for diligence and conduct might supplement grades for standard attainment. Moreover, grades for fundamental skills like literacy or simple computation could help educators focus remediation efforts. For example, a student’s inability to comprehend complex text might contribute to the failure of a history standard. Understanding this contributing factor could help tailor a student’s remediation plan. Similarly, schools might alter policies that require wholesale retaking of a failed course. By identifying the specific standard deficiencies, schools might again provide targeted and specific remediation efforts. Such targeted remediation would enable students not only to master the material, but also return to the prior academic pace without having to miss an entire semester or year. With the emphasis on having all students be college ready based on benchmark ACT scores, schools must be willing to use standardsbased grading approaches to ensure that students are truly college ready. At the same time, schools are expected to graduate all students and decrease retention rates. Policymakers and practitioners must figure out ways to measure students on attainment of key academic standards, while still providing necessary safety nets for students unable to achieve these standards. Finally, researchers should turn their attention toward the effectiveness of standardsbased grading practices in schools. Most prior research has centered on overall grading practices, and little empirical research exists to support a movement toward standardsbased grading. Researchers must build on the data within this study to establish a strong empirical research base for the widespread implementation of standardsbased grading. This research requires both quantitative and qualitative analyses of the influence of standardsbased grading practices on instruction and student achievement. Future research can use new endofcourse assessments to measure the association between grades and test scores within this new format. This research should also focus on the impact of standardsbased grading on the instruction and achievement of minority students and atrisk students. As the standardsbased grading movement continues to grow in secondary schools, researchers should explore the potential influence on the achievement gap as a result of new grading practices. References Allen, J. D. (2005). Grades as valid measures of academic achievement of classroom learning. The Clearing House, 78(5), 218–223. Baete, G. S., & Hochbein, C. (2014). Project Proficiency: A quasiexperimental assessment of high school reform in an urban district. Journal of Educational Research. Retrieved from http://www.tandfonline.com/doi/abs/10.1080/00220671.2013.823371#.U5G0bvldW7o. Balfanz, R., Herzog, L., & MacIver, D. (2007). Preventing student disengagement and keeping students on the graduation path in urban middlegrades schools: Early identification and effective interventions. Educational Psychologist, 42(4), 223–235. Bowers, A. J. (2009). Reconsidering grades as data for decision making: More than just academic knowledge. Journal of Educational Administration, 47(5), 609–629. Brennan, R. T., Kim, T., WenzGross, M., & Sipperstein, G. N. (2001). The relative equitability of highstakes testing versus teacher assigned grades: An analysis of the Massachusetts Comprehensive Assessment System. Harvard Education Review, 71(2), 173–216. Brookhart, S. M. (1991). Grading practices and validity. Educational Measurement: Issues and Practice, 10(1), 35–36. Brookhart, S. M. (1993). Teachers grading practices: Meaning and values. Journal of Education Measurement, 30(2), 123–142. Brookhart, S. M. (1994). Teachers grading: Practice and theory. Applied Measurement in Education, 7(4) 279–301. Burks, J. C. & Hochbein, C. (2013). The students in front of us: Reform for the current generation of high school urban high school students. Urban Education. Retrieved from http://uex.sagepub.com/content/early/2013/10/22/0042085913507060.abstract. Cizek, G.J., Fitzgerald, S. M. & Rachor, R. E. (1996). Teachers’ assessment practices: Preparation, isolation and the kitchen sink. Educational Assessment, 3(2), 159–179. Coleman, J. S., Campbell, E. Q., Hobson, C. J., McPartland, J., Mood, A. M., Weinfeld, F. D., & York, R. L. (1966). Equality of educational opportunity. Washington, DC: Government Printing Office. Conley, D. T. (2000, April). Who is proficient: The relationship between proficiency scores and grades. Paper presented at the annual meeting of the American Educational Research Association, New Orleans, LA. Cross, C. H., & Frary, R. B. (1999). Hodgepodge grading: Endorsed by students and teachers alike. Applied Measurement in Education, 12(1), 53–72. Cuban, L. (1993). How teachers taught: Constancy and change in American classrooms, 1890–1990 (2nd ed). New York, NY: Teachers College Press. Downey, D. B., von Hippel, P. T., & Hughes, M. (2008). Are “failing” schools really failing? Using seasonal comparison to evaluate school effectiveness. Sociology of Education, 81(3), 242–270. Duke, D. L. (1995). The school that refused to die. Albany, NY: State University of New York Press. Grant, G. (2009). Hope and despair in the American city: Why there are no bad schools in Raleigh. Cambridge, MA: Harvard University Press. Guarino, C., Reckase, M. D., & Wooldridge, J. M. (2012). Can valueadded measures of teacher performance be trusted? (Discussion Paper Series, Forschungsinstitut zur Zukunft der Arbeit, No. 6602). Bonn, Germany: Institute for the Study of Labor. Guskey, T. R. (2007). Multiple sources of evidence: An analysis of stakeholders’ perceptions of various indicators of student learning. Educational Measures: Issues and Practice, 26(1), 19–27. Guskey, T. R. (2009). Practical solutions for serious problems in standards based grading. Thousand Oaks, CA: Corwin Press. Haptonstall, K. (2010). An analysis of the correlation between standardsbased, nonstandardsbased grading systems and achievement as measured by the Colorado student assessment program (CSAP) (Unpublished doctoral dissertation). Capella University, Minneapolis, MN. Harwell, M., & LeBeau, B. (2010). Student eligibility for a free lunch as a SES measure in education research. Educational Researcher, 39(2), 120–131. Hinkle, D. E., Wiersma, W., & Jurs, S. G. (2003). Applied statistics for the behavioral sciences (5th ed.). Boston, MA: Houghton Mifflin. Hiss, W. C., & Franks, V. W. (2014). Defining promise: Optional standardized testing policies in American college and university admissions. Arlington, VA: National Association for College Admission Counseling. Hochbein, C., Mitchell, A., & Pollio, M. (2013). The influence of AYP as an indicator of persistently lowachieving schools. NASSP Bulletin, 97(3), 270–289. Jacob, B. A. (2005). Accountability, incentives, and behavior: The impact of highstakes testing in the Chicago Public Schools. Journal of Public Economics, 89, 761–796. Jefferson County Public Schools (2011a). Project proficiency guide. Unpublished manuscript. Jefferson County Public Schools (2011b). Student progression, promotion, and grading. Louisville, KY: Author. Kentucky Department of Education. (2010). No Child Left Behind (NCLB) interpretive guide 2010. Retrieved from http://www.education.ky.gov/nr/rdonlyres/0a2e4cd27b79476ca16a33415da5e2fe/0/2010_nclb_interpretive_guide.pdf Labaree, D. F. (2012). Someone has to fail: The zerosum game of public schooling. Cambridge, MA: Harvard University Press. Lekholm, A. K., & Cliffordson, C. (2008). Discrepancies between school grades and test scores at individual and school levels: Effects of gender and family background. Educational Research and Evaluation, 14(2), 181–199. Marzano, R. J. (2000). Transforming classroom grading. Alexandria, VA: Association for Supervision and Curriculum Development. Marzano, R. J. (2010). Formative assessment & standardsbased grading. Bloomington, IN: Marzano Research Laboratory. McMillan, J. H. (2001). Secondary teacher’s classroom and grading practices. Educational Measurement: Issues and Practices, 20(1), 20–32. McMillan, J. H., & Nash, S. (2000, April). Teacher classroom assessment and grading practice and decision making. Paper presented at the annual meeting of the National Council on Measurement in Education, New Orleans, LA. Moore, K. A. (2014). Making the grade: Assessing the evidence for integrated student supports. Bethesda, MD: Child Trends. National Collegiate Athletic Association (n.d.). NCAA Eligibility Center quick reference guide. Indianapolis, IN: National Collegiate Athletic Association. No Child Left Behind Act, 20 U.S.C. • 6319 (2001). Ogbu, J. U. (2003). Black American students in an affluent suburb: A study of academic disengagement. Mahwah, NJ: Erlbaum. Persistently LowAchieving School and School Intervention Defined. Kentucky Revised Statutes 160.346 (2010). Podgursky, M. J., & Springer, M. G. (2007). Teacher performance pay: A review. Journal of Policy Analysis and Management, 26(4), 909–949. Reback, R. (2008). Teaching to the rating: School accountability and the distribution of student achievement. Journal of Public Economics, 92, 1394–1425. Reese, W. J. (2005). America’s public schools: From the common school to “No Child Left Behind.” Baltimore, MD: Johns Hopkins Press. Rosenbaum, J. E. (1997). Collegeforall: Do students understand what college demands? Social Psychology of Education, 2(1), 55–80. Sanders, M. G. (2012). Achieving scale at the district level: A longitudinal multiple case study of a partnership reform. Educational Administration Quarterly, 48(1), 154–186. Shadish, W.R., Cook, T. D., & Campbell, D. T. (2002). Experimental and quasiexperimental designs for generalized causal inference. New York, NY: Houghton Mifflin. Spillane, J. P., Diamond, J. B., Walker, L. J., Halverson, R., & Jita, L. (2001). Urban school leadership for elementary science instruction: Identifying and activating resources in an undervalued school subject. Journal of Research in Science Teaching, 38(8), 918–940. Stiggins, R. J., Frisbie, D. A., & Griswold, P. A. (1989). Inside high school grading practices: Building a research agenda. Educational Measurement: Issues and Practice, 8(2), 5–14. Tyack, D., & Hansot, E. (1990). Learning together: A history of coeducation in American public schools. New Haven, CT: Yale University Press. Welsh, M. E., D’Agostino, J. V., & Kaniskan, B. (2013). Grading as a reform effort: Do standardsbased grades converge with test scores? Educational Measurement: Issues and Practice, 32(2), 26–36.


