Preliminary
Evaluation of A SelfPaced Course in Logic: Phil 102, Spring 2000
Austen Clark, Professor of Philosophy, University of Connecticut
20 October 2000
In the spring of 2000 the Institute for Teaching and Learning provided the resources for a rather bold and risky experiment: to convert a large lectureformat introductory course in logic and critical thinking, taught under the UConn general education requirements, into a “selfpaced” or “Keller plan” course. The point of this initial report is to indicate some of the grounds for thinking that this gamble will pay off.
Selfpaced vs. LectureandExam
In a “selfpaced” or “Keller plan” course the work for the semester is divided into some manageable number of units, resources are provided so that students can work through units more or less on their own, and a large number of tests are written for each unit, all parallel in form, and having items of the same kinds, but each containing different problems. Test sessions are offered weekly. Students can take as many tests on a given unit as they please, until they get a score with which they are satisfied. They then proceed to the next unit. The final grade for the semester is simply the average of the best scores over all the units. No limits are set on how quickly a student might proceed through the sequence. There are limits on how slowly one can go, necessitated by the fact that all things, including UConn semesters, eventually come to an end.
The large lecture format version of Philosophy 102 taught prior to the spring of 2000 relied on lectures, weekly quiz and review section meetings with teaching assistants, three one hour exams, and a final exam. Tests in logic (and math generally) arouse considerable anxiety in some of our students. The schedule is set by someone else, the tasks to be performed are somewhat unpredictable, and a bad performance during one hour of the semester can affect one’s final grade for the entire semester. All of these bad features go away in a selfpaced course. Students choose when to take tests. They know that they can take another test on the same unit if they bomb on the first one. They know that their final grade is more or less entirely under their control. The responsibility for learning is placed squarely on their backs, which is where it is located anyway. The selfpaced format simply makes that placement overt and controllable.
Most students respond magnificently. As one might guess, students love the format; but more to the point, they also seem to learn more critical thinking. Even a student who is in the course solely to satisfy a curricular requirement, and aiming simply to pass, can often be tempted into taking more tests, and trying to do better. Or at least this has been the author’s experience in selfpaced logic courses taught prior to coming to Storrs. Those were all relatively small courses, however. The question of the day, which had no known answer, was whether this format could be scaled up successfully, to handle a course with an initial enrollment of 240. The Institute for Teaching and Learning put some chips on the table, we rolled the dice, and decided to give it a try.
A Sea of Tests
Before launching such a project one must have an adequate textbook, organized by units, each with the objectives clearly detailed, and with many practice problem sets for all the different kinds of problems that show up on tests for that unit. One must also provide answers, and explanations of the answers, for all such problems. Fortunately I had developed my own textbook for Phil 102 in prior years, and without too much trouble the material in it was reorganized into six units. Candidate gamesters should also have at hand a large collection of potential test items, again with answers, the latter for the teaching assistants who do the grading. Those need to be whipped into separate tests and answer sheets for all the different tests.
By the end of the semester we had not just three one hour tests, but thirty six. Some units had five different tests, others seven. (Details, including a syllabus, can be found at the course web site at http://www.sp.uconn.edu/~py102vc.) At the end of the semester 221 students remained enrolled in the course. Those students took a total of 2,119 tests during the semester. Students could take a test every week, in section meetings with their TA, and use the final exam period as well, to take a maximum of 14 tests. The average number of tests taken per student was 9.6.
Our speed record was set by a student who took eight tests
and finished with an “A” in nine weeks.
Four students who received A’s in the course did it taking only six
tests (the minimum required), but most students took advantage of the
opportunity to retake tests on a given unit.
Nine students showed up at every possible testing session, taking 14
tests during the semester. An additional
twenty five missed just one session. Counts for the final grades and numbers of
tests taken by grade were as follows:
Final
grade: 
A 
B 
C 
D 
F 
Number of
students: 
13 
70 
74 
40 
24 
Average
number of tests: 
9.2 
10.7 
10.1 
9.7 
5.1 
Students who stopped at “B” took on average the greatest number of tests, and those with “F” the fewest. It was surprising to get only 13 A’s, but not too surprising to get two dozen F’s. (Thirteen of the students with a final grade of “F” took fewer than six tests, and many of those took one or two and then simply vanished for the rest of the semester.)
Teaching assistants are sometimes shocked at the numbers of students in a selfpaced course who stop when they get a B, or a C, or a D, and don’t proceed to try for an A, as any person likely to become a teaching assistant would have tried to do. The reader might find the distribution displayed above shocking for the same reason. The overall final average for the 221 students was 72.9. The average score over all 2,119 tests taken during the semester was 68.0.
The distribution of final grades in the spring 2000 version of Phil 102 almost exactly matches the distribution given in the fall of 1998, when the course was last taught in a traditional way. The overall average in 1998 was 73.2, and, as in the distribution above, there were about 5% A’s, 31% B’s, 33% C’s, and 11% F’s. A chisquare test found no significant difference in these distributions.
Student Ratings
So, one might ask, what’s the point of grading 2,119 tests? It seems we might have gotten the same final distribution giving just four tests per student—884 tests in all. Why grade the extra 5.6 tests per student?
The best answer is that students vastly prefer this format for learning logic, and at the margins, it seems to help some of them learn more of it. If the only point is to produce a final grade distribution, one does not even need four tests per student: one or two (a midterm and a final, say) would suffice. But that format can leave many students hating logic—a dispiriting result, dangerous in a democracy. It is vastly preferable for everyone involved to know that the final grades assigned are ones that the students have chosen to accept.
I did not expect to be able to demonstrate student preference for the selfpaced format using the UConn teaching evaluation instrument. Mean ratings for Phil 102 in the fall of 1998 were already pretty high, the distributions were skewed, and standard deviations were quite large (on average, 57% larger than those for the university as a whole). All these factors make it difficult to demonstrate a statistically significant movement upwards in student ratings of the course. But, surprisingly enough, we can.
The table below summarizes results for the first eleven items in the teaching instrument for the fall of 1998 and spring of 2000.

Means 

%
in ranks 8, 9, 10 



98f 
00s 
98f 
00s 

Presented
material 
7.7 
8.0 
64 
66 
ns 
Organization 
8.3 
8.6 
78 
82 
ns 
Clear
objectives 
8.1 
8.9 
70 
87 

Fulfilled
objectives 
8.3 
9.0 
78 
86 

Clear
assignments 
8.2 
8.9 
74 
86 

Stimulated
interest 
7.0 
7.9 
48 
62 

Graded
fairly 
8.2 
8.9 
73 
86 

Appropriate
exam 
8.1 
8.9 
68 
87 

Accessibility 
7.5 
8.5 
59 
79 

Interest,
concern 
7.6 
8.4 
59 
83 

Preparation 
8.7 
9.0 
84 
87 
ns 






Overall 
8.0 
8.6 
69 
81 

Means are reported, since they are familiar from the University reports. But ratings on these items are at best ordinal scales, and the difference between a 7 and an 8 cannot be assumed to be the same “size” as the difference between 8 and 9. So it is difficult to assign much meaning to these means. The other columns report a more tangible datum: what proportion of all the ratings given in response to that item were 8, 9, or 10. As can be seen, all those proportions went up in 00 spring. (As will be explained, “ns” marks items on which the change was not statistically significant.) A graphical display of all the ratings over all of the eleven items may make this clearer:
In the selfpaced course (spring 2000), 35.5% of all responses to the first eleven items were 10’s. This doubled the rate found in the traditional course (fall 1998), when the corresponding percentage was 17.5.
To assess whether these changes in the distribution of ratings were statistically significant, we need a test that does not require interval level measurement or assume that the distributions are normal. (One can see that they are not!) The KomolgorovSmirnof test is nonparametric and assumes only ordinal level measurement. It yields a chisquare of 6.72, which with N_{1} = 98 and N_{2} = 104 is significant at the .03 level. The even simpler chisquare test on the full set of ratings, assuming only nominal level measurement, yields a chisquare value of 303.4, which is significant at the .001 level.
The above is the distribution over all eleven items. Specific items yielded even more pronounced results. For example, an item of particular interest for a course which gives so many exams is whether the exams are appropriate. Here are the distributions of rankings on this one item:
In the spring of 2000, 72% of the answers respondents gave to this question were 9’s or 10’s, while in the fall of 1998 the corresponding percentage was 41%. Although the picture is probably proof enough of how much the students like the testing regimen in a selfpaced course, KomolgorovSmirnof confirms that this difference in the distributions is significant at the .01 level.
Most but not all of the other items show similarly significant results, with the first two (“presented material” and “organization”) and last (“preparation”) being the only exceptions. Indeed there is clearly something of a “halo effect” operating in these ratings, with good feelings on one engendering good feelings on another. I must confess to the readership that, as far as I know, my level of “interest and concern” for UConn students did not change at all between the fall of 1998 and the spring of 2000. Yet it was certainly perceived by students in the latter course to have been significantly higher. The irony is that proponents of the selfpaced format are sometimes accused of heartless indifference to the fate of their charges, since they leave those fates entirely in the hands of the charges themselves.
It is worth emphasizing that these comparisons are between two versions of the same course, taught within 18 months of one another by the same instructor, covering more or less the same material, using more or less the same textbook. The only changes in the latter were to reorganize the material into six units, add some study guides, and add some more exercises. So it is not entirely unreasonable to suppose that these changes in distributions mostly show the effect of changing to a selfpaced format.
A Second Evaluation
Enough questions arose during that changeover that I developed a homegrown evaluation instrument to ask about specific features of teaching the course selfpaced for the first time. This was administered at the same time as the standard UConn form, and it too garnered 103 respondents. Some of the results are of interest here. (The full results are posted at http://www.sp.uconn.edu/~py102vc/ev.htm. An unedited transcript of all the student comments written on the forms is found at http://www.sp.uconn.edu/~py102vc/comments.htm. )
The first two questions were whether the switch to a
selfpaced format made it any harder or easier to learn the material, and
whether it made learning that material any more or less enjoyable. I did not expect the selfpaced format to
make it any easier for a student to learn the material—studying and working the
exercises are constants—but the students disagreed, with only 40% saying it had
no effect, and 44% saying it made learning easier. Perhaps the feedback from repeated tests had educational
value. In any case, students can be
taken as better authorities on the second question: whether the selfpaced
format made learning the material more or less enjoyable than it would have
been otherwise. The responses draw a
compelling picture:
Similarly lopsided margins were found in answers to the very last question, “All in all, did you like the selfpaced format, or would you have preferred the traditional lecture and exam format?”. No need to worry about the margin of error of these results:
Not everyone does well with the selfpaced format (and, as noted above, 11% of this class got F’s) but seven out of eight of the respondents still prefer it over lectureandexam.
Two other items on the questionnaire provided somewhat surprising results and will be mentioned here. When the course started there was some concern about how well the average UConn student could learn logic in a selfpaced format. Logic is not the most compelling of curricular materials for independent study. So one question simply asked directly: how difficult was it to learn the material on your own, using just the study guides, textbook, exercises, and answers? The responses:
The surprise, given the final grading distribution, was that so many students described the work as “easy” or “very easy”. Like my teaching assistants, I wonder why more of them didn’t bother to get A’s.
Answers to another question might interest budget administrators. Students in the course sometimes complained that they should be allowed to attend more than one test session every week. My standard rejoinder: it is because the University won’t pay us for more TAs that we can’t allow students to take more than one test in a given week, and the University won’t pay for more TAs because we don’t want to raise tuition. This suggested the obvious hypothetical question: How much more tuition, if any, would you be willing to pay for this course, in order to have the option of taking more than one test per week? Response categories were labeled with the number of TAs that such a response would make possible (at spring 2000 graduate assistant rates). The results:
These 102 respondents indicate a willingness in the aggregate to contribute $2,880 in additional tuition to hire more teaching assistants for Philosophy 102. If these results are representative of the full enrollment, they suggest we could support an additional ½ TA in this course simply by passing around a hat. We could do it once a week, at the end of each testing session. Perhaps we’ll try that next year. Perhaps the budget authorities should allocate an additional TA to Philosophy, before we resort to such methods.
But do they learn more?
There is some tantalizing but inconclusive evidence that converting Philosophy 102 to a selfpaced format did encourage some students to work a little harder, and learn a little more, particularly at the end of the semester. As noted, the fall 1998 and spring 2000 versions of this course were matched in many respects: same instructor, similar content, almost the same textbook, etc., etc. Although the final grade distributions also almost exactly matched one another, perhaps a more sensitive test of the hypothesis is possible. Material in each of the three one hour exams in fall 1998 was split into two Units in spring 2000, and so, with some assumptions, it is possible to make some comparisons between results on specific tests. For example, Units 1 and 2 in spring 2000 covered essentially the same ground as Test 1 in the fall 1998 course. Units 5 and 6 in spring 2000 covered essentially the same material as Test 3 in the fall of 1998.
The tantalizing finding is that there are no significant differences discernible in mean test scores in the first two thirds of the semester—Tests 1 and 2 in fall 1998, compared to Units 14 in the self paced course. But students did significantly better in the last third of the selfpaced course. In the fall of 1998 the mean score on Test 3 was 64.5, while in the selfpaced course the mean of Units 5 and 6 combined was 69.8. A onetailed Student’s t test shows this difference in means to be significant at the .01 level. If the selfpaced format encourages some students to keep on working just a little bit longer, one would expect to find the differences at the end of the semester.
But this evidence is quite inconclusive. Some of the assumptions needed to compare test results are rather heroic. The test contents did change, at least somewhat. New material was added in some units, and with two hours to test what used to be done in one, the tests in the selfpaced course were proportionately longer and more comprehensive. The two distributions have different variance. And even though the courses were matched in some respects, for others there is no control. The students in the two classes were drawn from different years, and Admissions tells us that recent cohorts are more academically prepared than prior ones. Perhaps all the difference in performance is due to differences in the students themselves. Perhaps it all derives from my putatively increased level of interest and concern. On these questions the data are mute.
Acknowledgements
I’d like to thank the Head of the Philosophy Department, the Dean of the College of Liberal Arts and Sciences, and the Institute for Teaching and Learning for their support of the selfpaced project. I also thank Justin Fisher, Sam Hughes, Karl Stocker, Jim Phelps, and Virgil Whitmyer, who served as graduate teaching assistants, and did all the grading.