Suggested Characteristics of User Interfaces in Support of IS 2002 Curriculum Model Implementation and Program Accreditation Steven S. Presley s_presley@yahoo.com JJMA Maritime Sector, Alion Science and Technology 3400 Jerry St. Pe’ Hwy, Pascagoula, Mississippi 39567 USA Dr. Herbert E. Longenecker Jr. hlongenecker@usouthal.edu Dr. J. Harold Pardue hpardue@usouthal.edu Dr. Jeffrey P. Landry jlandry@usouthal.edu School of Computer and Information Sciences University of South Alabama Mobile, Alabama 36688 USA ABSTRACT The process of classifying information can be a complex task, especially when there are multiple taxonomies. Creating effective user interfaces for searching large, multi-taxonomic hierarchies for information classification purposes is a relevant problem facing human-computer interaction (HCI) researchers and practitioners. This study evaluated the effectiveness of overview and zoom capabilities in facilitating the task of classifying information in multi-taxonomic hierarchies. Usability tests of alternative interface designs were conducted within an experimental context. The experimental task involves classifying objectives for an information systems course into the multi-taxonomic hierarchies of the IS’97 curriculum model. Overview and zoom capability was operationalized by a multiwindow interface design, and the addition of dynamic query features were used to further increase the level of overview and zoom. Partial support was found for asserting increased levels of overview and zoom lead to increase subjective satisfaction, lower error rates, and less time required to complete the experimental task. Keywords: HCI, IS 2002, multiwindow, multipane, taxonomy, overview, zoom, filtering, subjective satisfaction, error rate, time to complete 1. MOTIVATION One of the primary goals of our study was to determine which interface characteristics would be useful in creating an interface that would aid professors and instructors in the classification of the courses they teach using the then IS’97 (now IS 2002) curriculum model. The researchers believed that if the interface were both effective and efficient from the standpoint of the end-users, the higher the likelihood they would complete the task and achieve the desired result – a thorough description of their courses based on the IS ‘97/IS 2002 curriculum model. A successful system could support several activities important to the IS 2002 effort. First, the IS 2002 Committee would be able to validate that most of the concepts in the curriculum model are currently being taught by Information Systems (IS) programs. Second, it would enable IS programs to determine which concepts (learning units) in the model are not being addressed by their current IS program curriculum. Finally, it would be a useful tool when assessing an IS program for accreditation purposes. A major challenge in building a successful system is finding the interface characteristics that best support the classification task using this model. This is a complex task because the learning units in the IS 2002 model can be presented to the end user in one of several possible hierarchical structures, or taxonomies. Any one of these might be useful depending on the user’s cognitive style and other factors. A review of the literature revealed several possibilities that were incorporated in experimental interface designs. 2. BACKGROUND The first step was to decompose the overall task of classifying courses using the IS‘97 Learning Units down into a set of general component tasks that could be compared against the existing research. For their model, the researchers used Shneiderman’s “Data Type by Task Taxonomy”, or TTT for short (Shneiderman 1998). Based on our analysis of the curriculum mapping task, the browsing, searching, selection, and filtering component tasks were further refined to fit the seven general tasks within the TTT Model. The Overview task of the TTT Model related to the classification tasks of browsing and exploring the descriptive elements in the IS’97 model, that is, the learning units. The Zoom task described in the TTT Model related to the classification task of allowing the user to see increasingly detailed information about each level of the taxonomy selected, down to the level of the learning units. The Filtering task in the TTT Model related to the need to allow users to remove unrelated descriptive elements of the taxonomy based on some dimension. The Details-On-Demand task in the TTT Model related to the need to navigate a taxonomy by clicking on nodes at each level, and to immediately see the related nodes on the level below, down to the specific learning units. The Relate task in the TTT Model was related to the need by the user to perceive the relationships of groups and subgroups in each taxonomy, and to relate the learning units to their course. The History task was related to the need to allow users to keep a list of descriptive elements that they have indicated as being relevant to the target datum, as well as the path the have taken in navigating the taxonomy. Finally, the Extract task in the TTT Model was related to the need to drill down to through the levels of the model taxonomies, and extract the information necessary to determine the relevance of individual descriptive elements. Another part of the TTT model also describes “data types”. Shneiderman (1998) states that the data types “characterize the task domain information objects and are organized by the problems that users are trying to solve.” The tree data type is a good descriptor of the IS ‘97 model when it is viewed as a single taxonomy. Because multiple taxonomies exist in the model, the multidimensional data type may also be applied. Several interface paradigms were identified in previous research that have been shown to support these types of tasks and one or more of these TTT data types. These include static hierarchical views, expand-contract views, dynamic queries, and multi-window views. Static hierarchical views resemble the table of contents in a paper-based publication (Chimera and Shneiderman 1994). While common, this approach was found to be problematic when the hierarchy is large enough to overwhelm the available screen space when used online. It is easy for the user to get lost and not perceive the on-screen information in the context of the global hierarchy (Chimera and Shneiderman 1994). Additionally, the time required to scroll through the list increased the time it takes for the user to perform the search and browse tasks. The IS 2002 model certainly falls into the category of a large hierarchy due to both the number of learning units and the details required to evaluate them for relevancy. In order to alleviate these problems with the static view approach, previous research showed two interface paradigms that provided “overview and zoom” capabilities. These interfaces include the familiar “expand and collapse” tree view interface, and the “multipaned” or “multiwindow” approach. Previous research suggested that the multi-window approach might be superior to the expand-collapse approach because the higher-level global information cannot be pushed off-screen (Chimera and Shneiderman 1994). For this reason, it was determined that the multiwindow interface showed the most promise. Another interface which showed promise was the Dynamic Query interface. One study showed significantly lower performance times for complex searches using dynamic queries as opposed to natural language queries and paper-based approaches (Williamson and Shneiderman 1992). Further research by Kumar et al. (1997) demonstrated high levels of subjective satisfaction and reduced performance times when browsing hierarchical data with dynamic queries and pruning, which can be considered a form of filtering. 3. RESEARCH MODEL AND HYPOTHESES Based on Shneiderman’s (1998) synthesis of prior literature, there are three sets of factors that affect the successful completion of a task using a user interface: individual user characteristics, the interface characteristics, and the characteristics of the task itself. Because the interface characteristics were the primary focus of this study, the individual user characteristics and the nature of the task were controlled for in the research design. The research model is presented in Figure 1. Two interface characteristics were deemed necessary to produce classification task success. These are “overview and zoom” capability and “filtering” capability. Since previous research demonstrated that interfaces with overview and zoom capabilities produce better results than static views (Chimera and Shneiderman 1994), it was expected that these capabilities would contribute positively to the success of the general classification task. Figure 1 - Research Model and Primary Experiment Significance Levels. Filtering capability allows the user to remove those parts of the hierarchy from consideration that have been identified as not relevant to the target datum. One study (Kumar et al. 1997) devised an interface that combined two “tightly-coupled” views of hierarchical data with dynamic queries used for pruning the hierarchical tree. The study demonstrated that pruning “significantly improved performance speed and subjective user satisfaction” (Kumar et al. 1997). For the purposes of this study, filtering capability and overview and zoom capability were considered equivalent, since both allow the user to identify items of interest, and exclude items that are not of interest. For this reason, the addition of filtering capabilities was considered an enhancement to the overview and zoom capabilities of the interface. Therefore, the independent variable in this study was determined to be the degree of overview and zoom present in the interface. Three dependant variables were identified as critical measures of interface success in this study. These were taken from the five human factors identified by Shneiderman (1998). These are the subjective satisfaction of the users, error rates, and the time to complete the experimental task. Based on the research model, three hypotheses were offered: Hypothesis 1: The degree of overview and zoom capability provided by a user interface will positively affect subjective user satisfaction in completing the multi-taxonomic classification task. Hypothesis 2: The degree of overview and zoom capability provided by a user interface will inversely affect error rates in completing the multi-taxonomic classification task. Hypothesis 3: The degree of overview and zoom capability provided by a user interface will inversely affect the time required to complete the multi-taxonomic classification task. While the study by Kumar et al. (1997) is similar to this one, there are some important differences that will allow this study to make a significant contribution to the HCI body of knowledge. First there were considerable differences in the interface design with regards to the representation of the hierarchical data. Second, there was a subtle yet important difference regarding the nature of the task in terms of the user’s understanding of the global data set. Finally, the PDQ tree browser was fine-tuned for five levels of hierarchy, whereas no such limitation could be imposed on a global data set that may be represented with multiple taxonomic hierarchical views. It was unclear how this would impact the cognitive load on the user and thereby affect task success. 4. RESEARCH DESIGN This study consisted of a controlled experiment in which test subjects were assigned to one of three groups. Each group was asked to complete the task of classifying the test course using one of three interfaces that had varying treatments of overview and zoom capability. The goal of the experiment was to determine the effect of increasing levels of overview and zoom, operationalized by using a multiwindow interface with and without filtering, would have on the dependant variables as compared with a static view which represents a very low level of overview and zoom capability, in completing the classification task. The experimental interfaces appear in Appendix A. The control interface was a static, fully expanded tree view interface, with a minimal level of overview and zoom capability provided. It is referred to in this study as the “Treeview” interface. The second interface improved upon the level of overview and zoom by including a tightly-coupled multi-window view of each taxonomy and is referred to as the “Multipane” interface. The third interface further increased the level of overview and zoom by adding a dynamic-query style interface to the multi-window interface, which allowed filtering on learning units and pruning of entire branches. It was suggested by previous research by Kumar et al. (1997) that rather than remove the pruned branches entirely, a better approach is to graphically mark them as pruned. Therefore, the pruned items were represented by a gray background color. This interface is referred to as the “Multipane with Filtering” interface. For all three interfaces, certain elements were kept consistent. Each interface allowed the user to view details of the learning unit, and to select learning units that were appropriate. A running list of selected learning units was present on the screen at all times. Interface characteristics not being studied, such as colors, fonts, and size of the display, were kept as consistent as possible. Finally, each interface allowed the user to view the data by different alternate taxonomies by means of a drop-down selector. The specific task being performed in the experiment involved the selection of a subset of learning units from the IS 2002 curriculum model to describe a hypothetical course. The course was designed by a panel of experts in the IS 2000 curriculum model to represent a broad spectrum of learning units, and would require the test subjects to browse through a large portion of the model hierarchies to find the appropriate learning units. Error rates were measured by comparing the list of learning units selected by the test subjects to a list of “correct” learning units that the panel of experts believed were indicated by the course description. Two types of errors were considered: the failure of a test subject to identify a learning unit that was indicated by the test course description, and the identification of learning units that were not indicated by the test course description. To measure subjective satisfaction, adapted portions of the Questionnaire for User Interaction Satisfaction (QUIS) ™, a standardized instrument licensed from the University of Maryland’s Office of Technology Commercialization, were administered to the test subjects immediately following the experiment. This survey consisted of an eleven-item, five-point Likert scale assessing overall reactions to the system. This survey appears in Appendix B. Time to complete task was measured by calculating the time elapsed between the test subjects completion of the interface tutorial and their indication that they were through with the task. This was built into the test system design. In order to control for potential confounding factors, the subjects needed to have a fairly homogenous background. Every effort was made to select subjects with similar levels of familiarity with the IS ‘97 model, computer and internet applications in general, and with courses similar to the test course. Randomly assigning subjects to one of the three test groups further helped mitigate differences in the subjects’ levels of task experience. Thirty graduate students were recruited as test subjects from courses of roughly similar levels in terms of their exposure to courses similar to the test course, and their exposure to elements in the IS ‘97 model. Members of each class were assigned randomly to one of the three interface treatments. Subsequent analysis showed that each class was represented equally in all three treatment groups. In order to determine the subjects’ abilities to understand IS ‘97 terminology and domain knowledge of the task, a domain knowledge quiz was given prior to the experiment, and the results were used as a control variable. All subjects were given an introduction to the classification task. They were asked to consider the test course description, and select learning units that appropriately described the course. The subjects were given unique usernames and passwords that automatically assigned them to one of the three interface treatments. Once they logged in, the system gave them a brief tutorial on the interface they were assigned. While necessarily different, each interface tutorial was kept as similar as possible in terms of phraseology, colors, font sizes, number of slides, and content. The subjects were given approximately 30 minutes to complete the task, and the times they actually spent were recorded by the system. Once complete, they were taken to the subjective satisfaction survey, and asked to wait for a brief demonstration. Following the experiment, the subjective satisfaction survey responses, time spent on the task by each subject, and error rates were collected from the database and analyzed. Data integrity was checked, and the resultant data was transferred to the Statistical Package for Social Sciences (SPSS) for statistical analysis. After all subjects had completed the online survey, a demonstration of all three interface treatments was given to all study participants, along with an explanation as to how the interface could be used to perform the experimental tasks. Within three days of the experiment, test subjects were given a second survey that appears in Appendix C. This survey asked about their perceptions of the three interfaces being tested following the demonstration. The subjects were also asked to share anecdotes relating to their experience with each interface, and their overall opinions and preferences. 5. DATA ANALYSIS AND RESULTS The thirty test subjects were divided into three treatment groups of ten subjects each. Each treatment group was assigned to one of three experimental interfaces, and Table 1 shows the summary of the results for each dependant variable for each interface, with the outliers removed. Reliability analysis of the on-line subjective satisfaction survey instrument showed an alpha of .9409 for the questions that were designed to measure subjective satisfaction, which include questions 2, 4, 14, 15, 16, 17, 18, 19, 20, 22, and 23 in Appendix B. 100% of participants completed the post-task survey for subjective satisfaction. The hypotheses were tested using MANOVA with two control variables included. Outliers that were two standard deviation units from the mean values of the category they occur in were removed from the sample, which resulted in four subjects’ data being removed from the sample. A significance level of .05 or lower was used to indicate positive support for the hypothesis. Factor and reliability analysis was used to assess the survey questions used to measure subjective satisfaction. The resultant general model provided the means to evaluate the hypothesis based on the observations collected during the experiment. Table 1 – Primary Experiment Results Summary Measure Scale Statistics Treeview (N=10) Multipane (N=8) Multipane with Filter (N=8) Subjective Satisfaction ? = .94 Mean = 3.8 N = 26 3.55 4.10 3.8 Total Error Rate Mean = 34.8 N = 26 37.5 33.75 33.25 Time on Task Mean = 0:20 N = 26 0:22 0:18 0:21 Domain Knowledge (control) Mean = 57.0 N = 26 60 54.2 56.25 Perceived Difficulty (control) Mean = 3.0 N = 26 3.2 3 2.9 The two control variables used were the subject’s domain knowledge of the task, and their perceived difficulty of the task. Domain knowledge was measured by the results of a quiz given immediately before the experimental task. Perceived difficulty was measured by question 1 on the post-task survey. Twenty-two of the original subjects completed and returned the follow-up survey, which gives the survey a response rate of 73%. The results of this survey were tested using two-tailed T-Tests for equality of means with a .05 significance level required to reject the null hypothesis. This was done for each possible interface comparison. The results of this analysis are summarized in Appendix C. 6. DISCUSSION OF RESULTS While the hypotheses were based on previous research, the level of complexity of the task and the multi-taxonomic nature of the task involved made it unique. It was not clear from the onset of this research whether any of the hypotheses tested would be supported. Hypothesis 1 – The Effect of Increased Overview and Zoom on Subjective Satisfaction The measure of subjective satisfaction was found to be very reliable (alpha = .94). The significance level observed in the general model for hypothesis 1 is .216. Although not statistically significant, the scatter plot diagram of responses does seem to suggest that a positive relationship may exist, and that the data is consistent with hypothesis 1. Subjective satisfaction survey scores were average or better across all three interfaces, which may indicate that users felt that all of the interfaces were easy to use and satisfying. Alternately, some aspect of the experiment design was affecting the users’ satisfaction scores. This could have been due to dissatisfaction with the time they had available to complete the task. This complaint was commonly encountered in the anecdotal evidence. The follow-up survey did show statistically significant support for hypothesis 1 in terms of the subjects’ perceived satisfaction rates. The subjects showed a strong preference for interfaces with increasing amounts of overview and zoom capability. Nearly all subjects preferred the Multipane and Multipane with Filtering interfaces to the Treeview. Hypothesis 2 – The Effect of Increased Levels of Overview and Zoom Capabilities on Error Rates Based on the general model, the manipulations of the interface did explain a significant amount of the variance observed in the subjects’ total error measurement. The significance level observed is .031. The scatter diagram further illustrated this effect, with the trend line being clearly negative going from the Treeview, to the Multipane interface, and on the Multipane with Filtering. This indicated that the subjects using the interface with the lowest levels of overview and zoom capability on average had more mapping errors than subjects using interfaces with higher levels of overview and zoom. The follow-up survey likewise showed that users believed that they would make fewer errors with the Multipane interface than with the Treeview interface, and fewer still with the Multipane with Filtering interface. The subjects believed that interfaces with higher levels of overview and zoom would enable them to correctly identify the Learning Units. Anecdotally, subjects reported that often with the Treeview interface they would get lost, and not be sure of why they were looking at a particular learning unit, and even what they were looking for. The interfaces with higher levels of overview and zoom appeared to lessen the cognitive load on the user, and unlike the Treeview, they did not seem to overwhelm the users with information. Hypothesis 3 – The Effect of Increased Levels of Overview and Zoom Capabilities on Time to Complete Task Based on the general model, the manipulations of the interface do not explain a statistically significant amount of the variance observed in the time spent on the task based on the observations made during the experiment. The significance level observed in the general model for hypothesis 3 was .541. The scatter diagram did not provide a clear indication of any relationship. This was not surprising to the researchers, since most of the subjects were unable to complete the task in the time that was available, regardless of the interface used. Therefore, the measurements observed only indicate the time spent during the experiment, and not the actual time needed to complete the experimental task. The follow-up survey, however, showed a strong belief by the test subjects that the interfaces with higher levels of overview and zoom would require less time to complete the task. Although certainly not conclusive of actual performance, this may be an indicator of the potential performance of the subjects were they able to remain on-task until completion. Anecdotally, subjects reported that they felt rushed, and that they would have needed much more time to do a thorough job. Other users reported that it took a relatively long time to understand how the Multipane with Filtering interface worked, and required some experimentation which counted towards their time-on-task variable measurement. Once they understood the interface, however, they felt it would be quicker than the Treeview and the Multipane. 7. CONCLUSIONS This study found statistical support in the primary experiment for the theory that increasing levels of overview and zoom will decrease error rates. While consistent with previous research, this study extends the existing research in the both the task domain and the nature of the multi-taxonomic data hierarchies. While no conclusive evidence was found to suggest that increasing the level of overview and zoom increases subjective satisfaction or reduces the time required to complete the task in the primary experiment, there was anecdotal evidence to support the findings of previous research. Significant statistical support was found for the test subjects belief that interfaces with higher levels of overview and zoom capability would lead to higher subjective satisfaction, reduce time required to complete the task, and reduce error rates. It is also clear from our study that the task of classifying courses using the IS 2002 curriculum model is a daunting one. Likewise, developing a user interface to support this task is not a simple exercise. This study found a great deal of preference, and in the case of error rates and subjective satisfaction, objective evidence for interfaces with increased levels of overview and zoom for this task. 8. SUGGESTIONS FOR FUTURE RESEARCH Future studies should ensure that the task is well suited to the subjects’ level of experience and knowledge domain. A repeat of this study using more commonly encountered hierarchies may yield more significant results. A further refinement would be to test an additional interface that combines the Treeview and the Filtering capabilities in order to study whether the filtering has a moderating effect on both the Treeview and the Multipane approach. The Multipane with Filtering approach could be combined with other non-hierarchical search paradigms, such as keyword searching, to determine if that approach yields even higher levels of error reduction, time reduction, and higher levels of satisfaction. Finally, research needs to be done about how perceptions of task difficulty affect user interface preferences. 9. REFERENCES Chimera, Richard and Ben Shneiderman, (1994). “An Exploratory Evaluation of Three Interfaces for Browsing Large Hierarchical Tables of Contents,” ACM Transactions on Information Systems (12:4), October 1994, pp. 383–406. Davis, G. B., J. T. Gorgone, J. D. Couger, D. L. Feinstein, and H. E. Longenecker Jr. (1997). IS ‘97 Model Curriculum and Guidelines for Undergraduate Degree Programs in Information Systems. ACM, New York, NY and AITP (formerly DPMA), Park Ridge, IL. Furnas, G. W. (1986). “Generalized Fisheye Views.” Proceedings of ACM CHi 86 Conference, ACM Press, New York, NY, pp. 16-21. Kumar, Harsha P., Catherine Plaisant, and Ben Shneiderman (1997). “Browsing Hierarchical Data with Multi-Level Dynamic Queries and Pruning.” International Journal of Human-Computer Studies, Volume 46, No. 1, January, pp. 103-124. Shneiderman, Ben (1998). Designing the User Interface: Strategies for Effective Human-Computer Interaction. 3d edition, Addison Wesley Longman, Inc., Reading, MA. Shneiderman, Ben, Christopher Williamson, and Christopher Ahlberg (1992) “Dynamic Queries: Database Searching by Direct Manipulation,” Proceedings of the ACM CHI 92 Conference: Human Factors in Computing Systems, May 3-7, ACM, New York, NY, pp. 669-670. Williamson, Christopher, and Ben Shneiderman (1992) “The Dynamic Homefinder: Evaluating Dynamic Queries in a Real-Estate Information Exploration System.” Proceedings of 15th Ann Int’l SIGIR ‘92 Conference, ACM, New York, NY, pp. 338 – 346. APPENDIX A – EXPERIMENTAL INTERFACE DESIGNS The following screenshots illustrate the experimental Interfaces used in the study. For each interface, the user was able to select an item by clicking on the text. The selected text would highlight by changing the background to a light yellow color, and the results would appear in the appropriate pane either below it (in the case of the multipane interfaces) or to the right hand side when a user selected a learning unit (all interfaces). Learning units that are selected by the user appear in a pane on the bottom right corner of the interface. For the multipane with filtering interface, pruned items were shown with a light grey background. Figure 2 – TreeView Interface Figure 3 – Multipane Interface Figure 4 – Multipane with Filtering Interface APPENDIX B - SUBJECTIVE SATISFACTION SURVEY INSTRUMENT The following are screenshots taken from the actual survey given to all participants following the completion of the experimental task. These questions were adapted from the Questionnaire for User Interaction Satisfaction (QUIS) ™, a product of the University of Maryland Office of Technology Commercialization. APPENDIX C – FOLLOW-UP SURVEY INSTRUMENT AND RESULTS The following instrument was administered to test subjects no more than three days after the experiment. Figure 5 - Follow-Up Survey Instrument. The survey results were analyzed by performing T-Test comparisons between the interface rating scores for each possible interface comparison. The results of this analysis are summarized below in Figure 2. Figure 6 - Follow-Up Survey Results.