Grade Distribution and Its Impact on CIS Faculty Evaluations: 1992-2002 David S. McDonald Roy D. Johnson Department of Computer Information Systems Georgia State University Atlanta, GA 30302-4015 USA davemcdonald@gsu.edu roy@gsu.edu Abstract This study examined the longitudinal effects of grade distribution and faculty evaluations in a large Computer Information System department. A significant relationship was identified between grade inflation and the letter grades issued by faculty over a 10-year period. Additionally significant was the effect of students’ expectations of high grades and their evaluations of faculty performance. Keywords: faculty evaluations, grade distribution, grade inflation, student expectations 1. INTRODUCTION In 2001, Patrick Healy, a Boston Globe reporter, stirred media attention with a report that Harvard University awarded half of its undergraduate class with grades of A or A-minus. Moreover, Healy found that more than 90 percent of the Harvard class was graduating with honors. These findings prompted further investigation by Healy. Through research in the Harvard archives examining graduation and honors data, Healy was able to document Harvard's honors rate was 91 percent compared to 51 percent in Yale and 44 percent in Princeton (Healy 2001). Although reports, such as Healy’s, draw attention to a long-neglected problem, many educators fail to acknowledge a problem exists. Rationalizations abound including the oft quoted “better students deserve better grades.” The problem is that, regardless of the institution and the quality of the students they accept into their degree programs, an “A” should signify excellence among a peer group…not among the general student population in the United States. Many institutions have accepted grading practices which consistently blur the distinction between good and outstanding performance, while awarding passing grades to students for merely showing up and turning in their work. These institutions do a great disservice to higher education as a whole. At fault are parents, faculty members, administrators and trustees, accrediting bodies, and higher-education associations, which have, for more than 25 years, shown a willingness to ignore, excuse, or compromise with grade inflation rather than fight it (Rojstaczer 2003). In the relatively new field of computer information systems (CIS), there are virtually no longitudinal studies of grade inflation for the technology majors who play an increasingly important role in today’s business environment. This paper identifies grading trends found by examining 10 years of data on the grades issued to an annual average of over 2000 CIS majors, at a large regional institution in the southeast United States. Furthermore, this study examines the impact of students’ expected grades on their perceptions of faculty capabilities as measured by a standardized survey instrument. 2. LITERATURE REVIEW Significant research has been conducted in the area of grading students and the effect of grades on student evaluations of faculty. Grade inflation exists, as evidenced by a host of studies conducted at both the high school and university levels (Nagle 1998; Healy 2001). Student evaluations of faculty are known to be influenced by a number of factors, including level of student maturity, the format of the evaluations themselves, class size, and whether the class was required or an elective (Coburn 1984). A review of literature specifically shows empirical evidence of a positive relationship between grade inflation and student evaluations. Studies have found that student evaluations can be used to manipulate or control faculty behavior (Young 1993), especially when used as part of a performance review process tied to compensation (Stone 1995). This, in turn, leads to certain specific teaching behaviors by faculty, specifically teaching practices that result in higher student evaluation scores (Damron 1996). Ironically, this was in direct conflict with the purpose of student evaluations, namely to improve the quality of instruction through feedback. In fact, student evaluations often have the opposite effect, increasing the frequency of poor teaching practices (Carey 1993), such as grade inflation (Greenwald 1996). Going one step further, evaluations created an environment that was consumerist and mercantilist in nature (Benson and Lewis 1994), where the focus shifts from accurately measuring student performance to “pleasing the customer”, and in which academic standards become greatly diminished (Renner 1981; Goldman 1993). Student evaluations also lead to a high-pressure environment for faculty, which results in a “self-policed lowered teaching standard” (Bonetti 1994). Finally, when examined over the long-term, rampant grade inflation has been found to be a cause of the overall reduction in quality in U.S. university education (Carey 1993; Young 1993; Crumbley and Fliedner 1995). In summary, the review of the literature has found that grade inflation does exist, and student evaluations can be used to control faculty. Furthermore, student evaluation scores directly affect how faculty members teach, and create a consumerist environment with lowered grading standards. Therefore, grade inflation is the direct result of this consumerist philosophy, and it is detrimental to the educational process at the university level. 3. RESEARCH METHOD For a number of years, the computer information systems (CIS) department involved in this study has gained national recognition for its graduate and undergraduate degree programs. This reputation is based upon the quality of the curricula, faculty and students. Quality can be maintained only with a willingness on the part of the college’s administration and the department’s faculty to critically examine their performance. In this study, the primary research questions are 1) whether grade inflation existed within the degree programs and 2) whether students’ perceptions of the grades they expected would impact faculty evaluations. Derived from these questions are a series of hypotheses. Based on the first research question, the hypotheses include: H1: There has been a steady increase of higher grades given to students in the CIS Department by full-time faculty over the past decade. H1a: The percentage of “A”s given has steadily increased. H1b: The percentage of “B”s given has steadily increased. H1c: The percentage of “C”s given have steadily decreased. H1d: The percentage of “D”s given has steadily decreased. H1e: The percentage of “F”s given has steadily decreased. The second research questions centers on whether students’ perceptions of the grades they expected would impact upon their evaluations of faculty. The hypotheses used to test this premise include: H2: Students would give higher evaluations to full-time faculty if they expected a high grade. H2a: Students expecting an “A” would give faculty higher evaluations. H2b: Students expecting a “B” would give faculty higher evaluations. H2c: Students expecting a “C” would give faculty lower evaluations. H2d: Students expecting a “D” would give faculty lower evaluations. H2e: Students expecting an “F” would give faculty lower evaluations. 4. DESIGN To investigate the validity of the hypotheses, a retrospective study was conducted. Students’ grade data were used to examine trends of grade distribution for the past decade. Additionally, data collected from the Student Evaluation of Instructor Performance form (SEIP), was used to examine the relationship between the grades students expected to receive and their perceptions of faculty performance. The SEIP is a validated, 35-item anonymous survey instrument using a Likert-scale from one to five with one being the lowest score and five being the highest score (see appendix) (Brightman and Bhada 1988). This same instrument collected self-reported factors including “What is your expected grade in this course?” (see appendix) and was used throughout the entire ten-year period. The key question this study focused on was Question 34, “How Effective was Your Instructor for This Course?” The SEIP was administered during the last week of classes before students receive their grades. Therefore, at this point in time, information on students’ expectation of a grade was collected. Therefore, it was not possible to match the individuals that might have completed the anonymous SEIP survey with the actual grade issued to a specific student. It has been the CIS departmental practice to maintain a relational database with key curricula data each semester. This data included the courses taught (titles), course level (i.e., graduate or undergraduate), the day(s), time, and computer number of each section offered, the ID of the instructor of record, the enrollments, the number of students attempting to enroll in full sections (a measure of demand), the number of withdrawals, the number of “A”s, “B”s, “C”s, “D”s, and “F”s issued by the instructor of record in each section, and the student evaluation scores for key aspects of both the course and the instructor. Utilizing various database queries, additional data on each instructor were available including their teaching designation, (i.e., doctoral-level graduate teaching assistant (GTA), part-time instructor, full-time instructor, visiting professor, assistant professor, associate professor or full professor), experience levels, teaching preferences, and teaching history. From the fall of 1992 to the fall of 2002, 58,315 grades were assigned in 1,931 course offerings. For this research, only courses taught by full-time faculty of rank were included to ensure a relatively consistent experience level and provide more meaningful results to other information systems departments. Similarly, to ensure homogeneity of the dataset, courses that used criteria-based testing methodologies, i.e., all of the doctoral courses and undergraduate capstone courses, were removed from the usable set of data (Crocker and James 1986). Unlike the rest of the graduate and undergraduate courses, which employ a norm-based testing methodology, both the doctoral courses and the undergraduate capstone course use a criteria-based methodology to ensure a base level of domain knowledge. The students in these courses may earn only an “A”, a “B”, an “S”, or a “US” and, therefore, their inclusion would unnecessarily skew the results of the analysis as well as lend doubt to the validity of the results. Thus, the data used in this study for hypotheses testing included 1,382 courses, with a total of 36,147 grades assigned by assistant, associate, or full professors. Certain courses in the curriculum require master of material to successfully complete the course prior to entering the follow-on course. For example, students taking programming courses are allowed to work on lab assignments until the correct answer is reached. This may imply that these students should achieve overall higher grades. However, at this institution, lab assignments account for a small portion of the student’s grade. Other assignments within the programming sequence require students to problem-solve and create programs during more heavily-weighted exams and thus, the effect of an increased grade due to lab work is negligible. Similarly, in most of the graduate and undergraduate curricula, the overwhelmingly predominant factor in students’ grades are their performance on mid-term and final exams rather than on homework or assignments. 5. DATA COLLECTION AND ANALYSIS Over the past decade, enrollments in the Information System major have grown at an unprecedented rate. As a result, the numbers of each grade assigned necessarily increased with the enrollments and could not be used as an adequate measure for grading trends. Therefore, rather than using the cardinal number of assigned “A”s, “B”s, “C”s, “D”s, and “F”s, the percentage of each grade assigned was determined for each individual course. The utilization of percentages normalizes the grade data and ensures accuracy in the analysis. Single and multiple regression analyses were performed to determine the significance of grade distribution over the 10 year period as well as to check for any associations between grade distribution and faculty performance as perceived by the students. Moreover, the grade distribution regressions were first run using the entire dataset. Then, undergraduate course and graduate course data were broken out for separate analyses. 6. RESULTS AND DISCUSSION Tables 1, 2, and 3 are the summary results of the regression analyses used to test the H1 series of hypotheses, “There has been a steady increase of higher grades issued to students in the CIS Department.” For the combined Bachelor of Business Administration (BBA), MBA, and Master of Science (MS) CIS degree programs, the percentage of “A”s has significantly increased, while, for the same period, the percentage of “C”s and “F”s have significantly decreased (p-value < .001). Thus, hypotheses H1a, H1c, and H1e are supported, suggesting that grade inflation existed in the CIS department during the period examined by this study. This finding was consistent with the literature (Nagle 1998; Healy 2001). A closer examination of this data, however, indicates the actual source of the grade inflation problem lies within the undergraduate program. Table 2 shows the results of the undergraduate grade distribution. For this analysis, 20,708 grades were administered to students in 780 undergraduate courses over the decade. Similar to the overall results, the analysis showed a significant increase in the percentage of “A”s, while the percentage of “C”s and “F”s significantly decreased for the same time period (p-value < .001). It is important to reiterate at this point that this data include only those courses taught by faculty of rank. The results of this analysis indicate that the full-time faculty teaching undergraduate courses were those most responsible for the department’s grade inflation problem. Table 3, the grade distribution for the CIS department’s graduate program over the past decade, confirms that a large part of the grade inflation problem rests with the faculty teaching undergraduate courses. For the graduate analysis, 15,295 grades were administered to students in 593 graduate courses over the decade. With this sub-group, only the percentage of “C”s given over the years has shown a significant decrease (p-value < .05). Two other phenomena were observed: First, the number of full-time faculty teaching undergraduate versus graduate courses was relatively close. As with many universities, a number of the undergraduate courses are taught by part-time instructors and graduate teaching assistants. However, policies established by the college in which this department resides mandates full-time faculty teach the majority of undergraduate courses. Second, this department’s culture does not limit any faculty member to teaching solely in the graduate or undergraduate programs. Therefore, one would not expect a bias in grade assignment as a result of faculty continually teaching graduate vs. undergraduate courses. Tables 4, 5, and 6 are the summary results of the regression analyses used to test the H2 series of hypotheses, “Students will give higher evaluations to faculty if they expect a high grade.” For the past decade, all CIS majors received a total of 15,234 “A”s, 15,029 “B”s, 4,259 “C”s, 704 “D”s, and 921 “F”s. This grade distribution, as percentages, was 42.2%, 41.3%, 11.9%, 2.0%, and 2.6%, respectively. As previously mentioned, the “usable data” consisted of 36,147 students evaluating faculty in 1382 courses. For the three degree programs (BBA, MBA, and MS), the data shown in Table 4 indicate that students expecting an “A” provide higher evaluations to faculty supporting H2a. However, the strength of this relationship was not nearly as significant as that found by students expecting a grade of “C” or lower, providing much stronger support for hypotheses H2c, H2d, and H2e. As with the grade distribution analyses, separating the undergraduate from the graduate data provides more meaningful results. Table 5 consists of data from 20,708 students in the BBA CIS program earning 7,330 “A”s (35.4%), 8,147 “B”s (39.3%), 3,704 “C”s (17.9%), 675 “D”s (3.3%), and 852 “F”s (4.1%). These students evaluated faculty in 780 courses. In this analysis, the regression with the strongest significance supports H2c. The implication was that undergraduate students expecting to receive a “C” are the most likely to give faculty lower evaluations. Similarly, the data also provides support for hypothesis H2d. That is, those students expecting to receive a “D” will evaluate the effectiveness of faculty lower than other cohorts. These findings are supported by the literature (Young 1993; Stone 1995). For the graduate student analysis, 15,295 students were awarded 7,830 “A”s (51.2%), 6,836 “B”s (44.7%), 533 “C”s (3.5%), 28 “D”s (0.2%), and 68 “F”s (0.4%). There were no significant relationships between graduate students’ expectations of their grades and their perceptions of faculty effectiveness (Table 6). This was not totally unexpected. In this college’s graduate programs, students are placed on academic probation if they earn a “C” in any course and may be removed from the program if a second “C” is earned. Comparing the overall regressions with those of the graduate and undergraduate sub-groups produced markedly different results. The following might have caused this: First, the nature, attitudes, and maturity level of undergraduates at this institution are markedly different from their graduate counterparts. This institution’s graduate programs have always been marketed toward the working student. Typically, they are, on average, six years older than the undergraduate student with considerably more work experience (Huss 2002). There is a marked difference between students continuing their education at the of their parents compared with a students continuing at the bequest of their employers. Second, faculty may treat graduate students differently from undergraduates. Over a ten-year period, with over 15,000 graduate students, only 28 “D”s were assigned (0.187%) or an average of 2.8 per year. Over the same time frame, among approximately 20,000 undergraduate students, “D”s were issued to 675 undergraduate students (3.375%). The graduate population for this study was approximately 75% of the undergraduate population. Yet, for the same time frame, 675 “D”s or 67.5 per year were given to undergraduates. If these groups were assigned grades in a similar manner, one would not expect such a great disparity. Moreover, graduate students received 68 “F”s (0.453%), while undergraduate students received 852 “F”s (4.26%). There is a likelihood that faculty are predisposed to assigning passing grades to graduate students. Consequently, with the majority of graduate students expecting either an “A” or a “B”, they may not feel the necessity nor have the opportunity to “punish” faculty with lower SEIP evaluations. 7. CONCLUSIONS This ten-year study has shown that grade inflation exists in the CIS department under examination. Such a problem was particularly significant in the undergraduate program of the department. Additionally, undergraduate students’ expectation of lower grades negatively impacted their evaluations of faculty performance, while no similar relationship was found for graduate students. This may be explained by the different levels of maturity, commitment, and motivation between graduate and undergraduate students. Faculty members may treat graduate and undergraduate students differently as evidenced by the fact that nearly 20% more “D”s and approximately 10% more “F”s were assigned to undergraduate students than their graduate counterparts — further complicating the grade inflation problem in the CIS department. This department’s administration reviewed the semester-by-semester reports generated from this dataset. For the two years, 1997 and 1998, the department chair, in his annual evaluation of faculty, emphasized the alarming increase in the number of “A”s given and warned faculty of penalizing their annual teaching evaluations. The results indicated a marked drop in student’s evaluations of their teacher’s performance. For example, the overall full-time faculty averaged 4.06 on SEI question 34 in 1996. In 1997 and 1998 this number dropped to 4.0. In 1999, 2000, 2001, and 2002, the SEI numbers increased to 4.08, 4.18, 4.20, and 4.16, respectively. These reports tend to give further credence to the assertions made in this paper. Although the grade distribution did not change significantly with this threat, faculty’s comments to students in the years 1997 and 1998 appeared to have changed student perceptions of their instructors’ performance as indicated by the drop in the faculty’s average SEI score to the 4.0 level. 8. LIMITATIONS AND RECOMMENDATIONS As with all studies, this one was subject to certain limitations. First, data were collected in one CIS department at a large regional university in the southeast U.S. Generalization of the results to the other academic departments within this same institution or CIS departments in other institutions should be done cautiously. Future studies should examine data collected from other academic departments and other institutions across the United States. Second, the relationship between students’ grades and their perceptions of faculty performance was not based on the actual grades students received. It is true that in the week prior to the end of the semester, students’ expected grades generally reflects their actual grades. A direct comparison of students’ expected grades and their actual grades would resolve this limitation, the anonymous nature of the SEIP does not permit a direct comparison. Third, it was unclear whether SEIP scores have had an impact on faculty teaching behaviors. In this study, it was found that significantly more non-passing grades were given to undergraduate students than those given to graduate students. It would be interesting to compare the faculty SEIP scores between faculty who teach graduate courses and those who teach undergraduate courses. Fourth, as institutions increase their enrollment standards, students should be held to the higher standard. Do faculty test graduate students with the same material and expectations on exams given to undergraduates? Following this logic, as institutions increase the quality of their students, do faculty teach and test with the same expectations of students whose average GPA is a 2.0 with a low SAT admission score as those students who, under higher admissions guidelines, have a greater potential to comprehend course materials in much greater depth and breadth? Although this question is not answered in this , it must be considered a limitation. Lastly, the highest adjusted R2 value was .032. This indicates a relatively small amount of the variance accounted for by the variable relationships shown in the regression model. Unfortunately, this is a limitation in any study that utilizes secondary data for analysis. A future smaller-scaled study, with variables of interest specifically chosen to improve the statistical model may be used to better validate the results presented in this study. With increased demand for high quality of higher education, faculty should not be concerned with being “punished” by students’ evaluations. The current faculty performance evaluation should focus on receiving valuable feedback from students to improve their teaching, instead of allowing students to manipulate or control faculty behavior. Strong administrative support is necessary to encourage faculty to maintain high academic standards without negative consequence of poor student evaluations. 9. REFERENCES Benson, D.E. and J.M. Lewis (1994). "Students' evaluation of teaching and accountability: implications from the Boyer and the ASA reports." Teaching Sociology 22: 195-99. Bonetti, S. (1994). "On the use of student questionnaires." Higher Education Review 26: 57-64. Brightman, H. and Y. Bhada (1988). Validation of Student Evaluation of Instructor Performance (SEIP) form. Atlanta, Georgia State University. Carey, G.W. (1993). "Thoughts on the lesser evil: student evaluations." Perspectives on Political Science 22: 17-20. Coburn, L. (1984). Student evaluation of teacher performance. Crocker, L. and A. James (1986). Introduction to Classic and Modern Test Theory. New York, Harcourt, Brace, and Jovanovich: 192-194. Crumbley, L.D. and E. Fliedner (1995). Accounting administrators' perceptions of student evaluation of teaching (set) information. Damron, J.C. (1996). Instructor personality and the politics of the classroom. Goldman, L. (1993). "On the erosion of education and the eroding foundations of teacher education." Teacher Education Quarterly 20: 57-64. Greenwald, A.G. (1996). Applying social psychology to reveal a major (but correctable) flaw in student evaluations of teaching. University of Washington. Healy, P. (2001). Low, high marks for grade inflation. Boston Globe. Boston: A20. Huss, F. (2002). Academic Profile: AY2002-2003. Atlanta, Georgia State University: 1-16. Nagle, B. (1998). "A proposal for dealing with grade inflation: The Relative Performance Index." Journal of Education for Business Vol. 74 (Issue 1): p. 40. Renner, R. (1981). "Comparing professors: how student ratings contribute to the decline in quality of higher education." Phi Delta Kappan 63(2): 128-30. Rojstaczer, S. (2003). Where All Grades Are Above Average. Washington Post. Washington, D.C.: A21. Stone, J.E. (1995). Inflated grades, inflated enrollment, and inflated budgets: an analysis and call for review at the state level. Education Policy Analysis Archives. 3. Young, R.D. (1993). "Student evaluation of faculty: a faculty perspective." Perspectives on Political Science 22: 12-16. TABLES Table 1. Grade Distribution: Graduate and Undergraduate Programs (1992-2002) Dependent Variable Standardized Beta Coefficient Adjusted R2 F Significance Percent “A”s .172 .030 41.946 .000* Percent “B”s -.077 .000 .066 .797 Percent “C”s -.158 .025 35.501 .000* Percent “D”s -.175 .004 6.169 .013 Percent “F”s -.193 .035 50.336 .000* * indicates p-value < .001 Table 2. Grade Distribution - Undergraduate Program (1992-2002) Dependent Variable Standardized Beta Coefficient Adjusted R2 F Significance Percent “A”s .182 .033 26.818 .000* Percent “B”s .000 .000 .000 .992 Percent “C”s -.153 .023 18.587 .000* Percent “D”s -.052 .003 2.077 .150 Percent “F”s -.212 .045 36.574 .000* * indicates p-value < .001 Table 3. Grade Distribution: Graduate Program (1992-2002) Dependent Variable Standardized Beta Coefficient Adjusted R2 F Significance Percent “A”s .074 .006 3.286 .070 Percent “B”s -.045 .002 1.204 .273 Percent “C”s -.083 .007 4.093 .044** Percent “D”s .038 .001 .849 .357 Percent “F”s -.039 .002 .902 .343 ** indicates p-value < .05 Table 4. The Effect of the Expected Grade Students Would Receive on Their Perception of Faculty Effectiveness: Graduate and Undergraduate Programs (1992-2002) Independent Variable Standardized Beta Coefficient Adjusted R2 F Significance Number of “A”s .059 .003 4.872 .027** Number of “B”s .036 .001 1.771 .183 Number of “C”s -.179 .032 33.667 .000* Number of “D”s -.128 .016 8.970 .003** Number of “F”s -.120 .014 8.691 .003** * indicates p-value < .001 ** indicates p-value < .05 Table 5. The Effect of the Expected Grade Students Would Receive on Their Perception of Faculty Effectiveness: Undergraduate Programs (1992-2002) Independent Variable Standardized Beta Coefficient Adjusted R2 F Significance Number of “A”s .051 .003 2.042 .153 Number of “B”s .059 .004 2.708 .100 Number of “C”s -.179 .032 23.625 .000* Number of “D”s -.124 .015 6.647 .010** Number of “F”s -.105 .011 5.146 .024 * indicates p-value < .001 ** indicates p-value < .05 Table 6. The Effect of the Expected Grade Students Would Receive on Their Perception of Faculty Effectiveness: Graduate Programs (1992-2002) Independent Variable Standardized Beta Coefficient Adjusted R2 F Significance Number of “A”s .000 .000 .000 .993 Number of “B”s -.015 .000 .138 .710 Number of “C”s .032 .001 .289 .591 Number of “D”s .002 .000 .000 .985 Number of “F”s -.097 .009 1.190 .277