next_inactive up previous


Context Learning Can Improve User Interaction

Sushil J Louis, Anil Shankar

Evolutionary Computing Systems Laboratory (ECSL)

Department of Computer Science and Engineering

Abstract:

Current computer applications lack user context and do not learn to use this context to improve user interaction. In this paper we present Sycophant, a context learning calendar application program which learns a mapping from user-related contextual features to application actions. In this preliminary work, Sycophant achieves good accuracy in learning this mapping. In addition, we find that including external context such as the presence or absence of motion and speech provides better performance in learning accurate mappings.

Introduction Computers today use an internal clock, keyboard and mouse to provide input or context for their applications' information processing. Operating Systems support these devices and applications access this provided context through simple Application Programming Interfaces (APIs). Current application software uses this meager context to build user models and try to enhance personal productivity. There is no personalization through learning, no long term memory, and advances in vision, speech, text analysis, and the availability of cheap computing power have not been fully utilized. That is, many current computer applications lack context-awareness.

Dey [5] gives the following definition for context.

Any information that can be used to characterize the situation of entities (i.e., whether a person, place or object) that are considered relevant to the interaction between a user and an application.

In our work, we view a computer as a stationary robot with simple sensors such as for motion and speech[11]. Even without knowing who is there or what is being said, such simple sensors can be used to improve user interaction. For example, if you were Jane's user-interface you could learn answers to the the following questions.

We propose to use simple sensors to continuously gather data on the computer system's internal and external environment, store this data in a data warehouse, and mine this data for useful user-behavior patterns, in order to better predict user preferences (behavior) and improve user interaction. Application can then use this learned model of user preferences to better interact with the user. In this paper, we use a simple calendaring application program, Sycophant, that stores appointments and reminds the user using different types of reminders as a test-bed to investigate these issues. Specifically, we investigate whether Sycophant can learn a mapping from context-features to reminder type. More generally, we are interested in whether applying machine learning techniques to data gathered from simple context sensors will lead to improved human computer interfaces.

Our system continously gathers binary activity data from the keyboard, mouse, a motion detector, and a speech sensor. We also monitor the activity of five processes on the computer. Whenever Sycophant generates a reminder, it expects the user to indicate whether Sycophant used the correct reminder type. A reminder can be visual (a pop-up window), speech using a text-to-speech system, both, or neither. Periodically, we run a machine learning algorithm on the gathered data merged with this user feedback to learn to predict which of the above four types of reminders to generate for an appointment. Preliminary results using Sycophant with external (motion, speech) and internal (keyboard, mouse activity) sensors and a decision tree machine learning algorithm leads to about $80\%$ accuracy in predicting whether or not to generate a reminder. Correctly predicting which of the four different types of reminders to generate is less accurate at about $64\%$.

Related Work Much work has been done in the area of context-aware applications and environments. Reba is a reactive system which creates context-aware room reactions by using information from cameras, microphones and other sensors [11]. This work showed the necessity for systems to be context-aware to be able to anticipate user actions and simplify user interaction.

Bailey and Adamczyk have showed that computer generated interruptions which require user input or feed back have a disruptive effect on the user's emotional state as well as the user's task performance[2]. Their study showed that at the point of interruption, the degree of disruption depends upon the user's mental load. Their work implied that the user's attention must be carefully managed among competing applications and that this management is necessary to mitigate the disruptive effects of necessarily interrupting a user.

Hudson, Fogarty, Atkeson et al.'s work comes closest to our own in exploring how to construct robust sensor-based predictions of interruptibility by conducting a Wizard of Oz study [10]. They also considered which sensors might be useful and how they could be constructed. In their study they used experience sampling to collect self-reports of interruptibility. Next, they built statistical models predicting human interruptibility and achieved an overall accuracy of $78\%$ using several models. The self-reports from their initial Wizard of Oz study, where a subject was asked to distinguish between different levels of interruptibility showed that it is possible for humans to be accurate to an extent of $76.9\%$ and statistical models could achieve as much as $82.4\%$ accuracy. In their next study, they used real sensors to to construct models of human interruptibility for three different groups of people who included interns, managers and researchers by [6]. This study also tried to determine how much data should be collected to provide statistically reliable estimates of interruptibility.

Horvitz and Apacible built models for predicting the cost of interrupting users [9]. For this purpose, they used machine learning techniques for generating statistical models to infer the state of interruptibility of users.

Our work is complimentary to the above approaches. Sycophant learns whether or not to interrupt the user as well as how to interrupt the user. Like Fogarty we use real sensors but in addition to learning whether to interrupt the user, sycophant uses machine learning algorithms to learn which one of four different types of reminders to use in interrupting the user. In this work, we compare the performance of different algorithms as well as the effect of different sensors on learning the mapping from sensors to reminder type.

SYCOPHANT

Sycophant can generate four different types of reminders: A simple pop-up window containing the appointment text, a voice reminder where the appointment text is spoken using the Festival Speech Synthesis System [3], both the previous types, and neither. In the last case, no reminder is generated but is instead buffered for later output. This can be desirable behavior for example when there is no one in the room at the time the reminder is generated. On the other hand, it can also be quite annoying if your calendar ``learns'' not to remind you under certain conditions.

The appointments for Sycophant were set up to mimic the user's regular work-day. These included reminders for drinking coffee, attending talks, conferences, classes, some personal appointments, and reminders outside regular office hours. For example, a reminder for watching for watching a soccer match on cable TV at two a.m. in the morning would fall outside of regular office hours. Sycophant in this case learned a rule which said that if the appointment time was before nine a.m. in the morning, then no reminder was to be generated. This context learning was performed with respect to the user under study whose regular office hours start at nine a.m. Figure 1 shows a screen-shot of the application.

Figure 1: Screen Shots of Sycophant's Main Interface
\includegraphics[height = 2.8in, width = 3.3in]{Images/Screenshot2_dv.ps}

Figure 2 depicts sycophant's architecture. The calendaring application runs as separate process and five sensors collect data on the computer and immediate vicinity. These sensors are binary, for example, when the motion sensor detects motion it reports a value of $1$, $0$ otherwise. Our sensors are:

Figure 2: Sycophant Architecture
\includegraphics[height = 4.0in, width = 3.3in]{Images/arch.eps}

For this preliminary feasibility study, we collected data from a single user over a period of six weeks. Every fifteen seconds, we checked all sensors for activity and stored these values to a file. Next, we extracted the following six features from the raw data [10]: Any5, if the sensor is active during any of the fifteen second intervals during the last five minutes. All5, if the sensor is active during all of the fifteen second intervals during the last five minutes. Any1, if the sensor is active during any of the fifteen second intervals during the last minute. All1, if the sensor is active during all of the fifteen second intervals during the last minute. Immed, if the sensor is active during the last fifteen second interval. Count, the number of intervals during which the sensor is active during the last five minutes. Therefore every sensor provided six features. We considered each of the five user processes as a separate ``sensor,'' so the number of sensors grew to nine and we therefore ended up with a total of $54$ features. Finally, we also included a user identifier and the next appointment time.

Sycophant can be instructed to remind a user $t$ minutes before a scheduled appointment. When it is time to remind a user (say $5$ minutes before the appointment time), Sycophant initially checks for the existence of a learned user model (a mapping of context features to reminder type). If so, it uses the reminder type as dictated by the model to remind the user. Initially, when no model has been learned, we use a hand-coded rule set. This static hand-coded rule set is used until we get the minimum of ten exemplars needed by Weka ($10$-fold cross-validation requirement). Once the reminder is generated, the user can give feed-back to Sycophant agreeing with the reminder type generated or providing their preference. It is this user-feedback which is used for creating the training data set for our machine learning algorithms.

Here is an exemplar from our data set:

User1, 05.00, 0, 0, 0, 0, 0, 0, 7, 0, 1, 0, 0, 0, 20, 1, 1, 1, 1, 1, 20, 1, 1, 1, 1, 1, 20, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 20, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
where the first two features correspond to User-Id and Time of Appointment. The remaining set of features in groups of six represent Sensor-Count, Sensor-All5, Sensor-Any5, Sensor-All1, Sensor-Any1 and Sensor-Immed. The sensors are ordered as follows: Motion, Talk, Process, Keyboard and Mouse. Each of these six features is derived for each sensor mode. The last value from the above data row corresponds to the type of reminder actually preferred by a user, which is obtained through user feedback.

Once, the context-sensitive data set is available, it is used to build a user model to learn the type of reminder to generate. We use and compare multiple machine learning algorithms from the Weka machine learning tool-kit for this purpose

As an aside, in the current version of the program, the reminders which get voted by the model to be of type-$0$ (do not interrupt/remind) get buffered. When it is time to use a reminder of types other than 0, the buffered reminders are also output along the reminder for that particular instant of time. Thus no appointments are ever ``forgotten.''

Results For our study, we chose the following machine learning algorithms from the Weka tool-kit: Zero-R, One-R, J48, Bagging, Logit-Boost and NaiveBayes. Zero-R simply predicts the majority class in categorical data or average class if the class is numeric. One-R generates a one level decision tree which tests only one particular attribute and forms a set of rules based only on that attribute. J48 builds a C4.5 decision tree [12]. Bagging creates $n$ artificial data sets from the original data set and applies a decision tree inducer on each of them. The $n$ generated classifiers then vote for the class to be predicted. LogitBoost uses a learning algorithm for numeric prediction and a combined model is formed which is then used for classification [7]. NaiveBayes selects the most likely classification based on a set of attribute values using prior probabilities and conditional densities of the individual features.

We also constructed three other data sets with reduced numbers of features after ranking the individual features based on the information gain ratio [12] from decision tree induction. The top $29$ features are considered in one set (Set 1), the top $25$ in the next set (Set 2), and the top $20$ in the last set (Set 3). Next, we compared the performance of J48 on these data sets against the complete data set (Set 0) with all the $55$ features. We provide the top $25$ features below in order of information gain:

Keybd-Count5, Keybd-Any5, Mouse-Count5, Mouse-Any5, Keybd-Any1, Mouse-Any1, Keybd-Immed, Mouse-Immed, MotionCount5, Motion-Any5, Motion-Any1, ApptTime, Mouse-All1, Keybd-All1, Motion-Immed, P3-Count5, P1-Count5, P2-Count5, P1-All5, P2-All5, P4-All5, P4-Count5, Talk-Count5, Motion-All1.

Figure 3 shows the performance of J48 on different data sets. J48 correctly classified $64.4\%$ of the instances and generated $35$ rules for the complete data set (with $55$ features). The algorithm also correctly classified $62.5\%$ of the instances on the reduced data set with $25$ features with $36$ rules being generated. The figure shows the relative performance of the decision tree inducer on our four data sets and there seems to be little performance degradation even with only $20$ features. We chose to use Set 2 with $25$ features for further study.

Figure 3: Performance of J-48 on ranked data sets with different number of features
\includegraphics[height = 2.8in, width = 3.3in]{Images/j48_onranked.ps}
Once we had preliminary identification of the top $25$ features, we compared the classification accuracy of different machine learning algorithms on the complete data set (Set 0) and Set 2. Figure 4 shows this comparison for the four class reminder problem. We also investigated the two class reminder problem where the two classes are whether or not to generate a reminder. That is reminder types one through three where lumped into one category. Figure 5 shows performance for this problem. All the learning algorithms have almost the same performance on both the four class reminder problem and the two class reminder problem. Note that they are able to do significantly better on the two class problem achieving a classification accuracy of above $80\%$. This implies that predicting whether to generate a reminder or not is a significantly simpler problem than predicting which of four reminder types to generate. We also noted that One-R chose keyboard usage as the most useful feature. It ties in well our observation that the user under study expertly uses a number of keyboard shortcuts.
Figure 4: Comparison of different learning algorithms on the complete data set and reduced features data set for the four class problem
\includegraphics[height = 2.8in, width = 3.3in]{Images/fullvstop_four-class.ps}
Figure 5: Comparison of different learning algorithms on the complete data set and reduced features data set for the two class problem
\includegraphics[height = 2.8in, width = 3.3in]{Images/fullvstop_two-class.ps}

Table 1: Confusion Matrix for the four classes of reminders for the full featured data set with 55 features
Predicted $\Rightarrow$ 0 1 2 3
Actual $\Downarrow$        
0 170 7 7 4
1 19 20 8 15
2 11 6 4 10
3 6 17 5 14



Table 2: Confusion Matrix for the four classes of reminders for the reduced data set with 25 features
Predicted $\Rightarrow$ 0 1 2 3
Actual $\Downarrow$        
0 165 8 10 5
1 18 24 5 15
2 13 8 2 8
3 10 14 7 11



Table 3: Confusion Matrix for the two-class reminder for the full featured data set with 55 features
Predicted $\Rightarrow$ No-Reminder Generate-Reminder    
Actual $\Downarrow$        
No-Reminder 162 26    
Generate-Reminder 36 99    



Table 4: Confusion Matrix for the two-class reminder for the reduced data set with 25 features
Predicted $\Rightarrow$ No-Reminder Generate-Reminder    
Actual $\Downarrow$        
No-Reminder 164 24    
Generate-Reminder 32 103    


The confusion matrix obtained for the complete data set is shown in Table 1. J48 correctly classifies $64.40\%$ of the instances in the complete data set. The elements along the principal diagonal are the true-class values. From the confusion matrix we can infer that 170/188 instances of type-$0$ (no-reminder) are correctly classified. The learning algorithm is not able to very accurately discern when to use reminder types $1, 2$, and $3$. Table 2 shows the confusion matrix obtained for the top $25$ features data set. Here the decision tree provides an accuracy of $62.5\%$ with 165/188 instances of type-$0$ being correctly classified.

The confusion matrices for the two class problems are given in Table 3 and Table 4. J48 improved its performance to $80.81\%$ in case of the data set having all the features and to $82.66\%$ on the reduced feature data set. Clearly Sycophant can more accurately predict whether or not to generate a reminder. Predicting which reminder type to generate seems harder and this remains an area of active research in our group. Finally, removing motion and speech features from Set 0 resulted in a statistically significant decrease in prediction accuracy on the four class problem - the clearly points to the importance of paying attention to the computer system's environment (external context) in improving user interaction.

Although the decision trees generated for the user model could not be included in the limited space available, we would like to note the following. Keyboard, Mouse, Motion and Talk give the most useful information as evidenced by the tree generated for data Set 0 as well as from the ranking of individual features based on the information gain ratio criterion. On the complete data set with four classes of reminders, the decision tree constructed an interesting rule with a keyboard feature (Keyboard-Any5) chosen as the root node. The user's working hours which start at approximately 9 a.m. is the next significantly useful feature. Next is talk count. For example if the talk count was greater than 2, and there is no motion in the last minute and if the appointment time is greater 12.20 (lunch time) then both types of reminders are generated because the user is usually in a comatose state after lunch and did not care which type of reminder she wanted and often chose both. On the reduced feature data set, for the four classes of reminders, J48 constructed an interesting rule with Keyboard-Any5 at the root node. If the talk count in the last five minutes was greater than 2 and there is keyboard activity in the last minute, then generate a voice reminder. This seemed to make sense to the user in that she has just started to use heavy use of the keyboard and therefore prefers a soothing voice reminder over a more distracting pop-up window.

Conclusions and Future Work In this paper, we investigated an approach to building a context-learning user interface application using information from simple sensors that detected internal (keyboard, mouse, and process activity) and external (motion, speech) context. Our calendaring application, Sycophant, used machine learning techniques to learn, based on this context information, a mapping from sensor values to reminder types. We obtained $64\%$ accuracy in learning to choose between four reminder types (four class problem); more impressively, we were able to obtain $80\%$ accuracy for the task of learning whether or not to generate a reminder (two class problem). We found that simple sensor information like the existence of motion and speech in the user's vicinity along with keyboard and mouse activity are useful for learning the mapping from sensor values to reminder types.

We are now gathering more data from different groups of users and are considering the suitability of other applications to our context-learning approach. Investigation is also being done into finding more sensors or varying the current sensors to increase the performance of the system. Finally, we would like to investigate adaptive user interfaces that combines expert generated rules with machine learned rules in genetics-based machine learning systems [8]. Acknowledgments This work was supported in part by contract number N00014-03-1-0104 from the Office of Naval Research.

Bibliography

1
Motion.
http://motion.sourceforge.net/.

2
P. D. Adamczyk and B. P. Bailey.
If not now, when?: the effects of interruption at different moments within task execution.
Proceedings of the 2004 conference on Human factors in computing systems, pages 271-278, 2004.

3
A. Black, P. Taylor, and R. Caley.
The festival speech synthesis system.
1998.

4
C.M.U.
Sphinx: Open source speech recognition.
http://www.speech.cs.cmu.edu/sphinx/.

5
A. K. Dey, G. D. Abowd, and D. Salber.
A conceptual framework and a toolkit for supporting the rapid prototyping of context-aware appplications.
Human Computer Interaction, 16, 2001.

6
J. Fogarty, S. E. Hudson, and J. Lai.
Examining the robustness of sensor-based statistical models of human interruptibility.
Proceedings of the 2004 conference on Human factors in computing systems, pages 207-214, 2004.

7
J. Friedman, T. Hastie, and R. Tibshirani.
Additive logistic regression: A statistical view of boosting.
Technical report,Department of Statistics, Stanford University, 1998.

8
D. E. Goldberg.
Genetic Algorithms in Search, Optimization, and Machine Learning.
Addison-Wesley, Reading, MA, 1989.

9
E. Horvitz and J. Apacible.
Learning and reasoning about interruption.
Proceedings of the 5th international conference on Multimodal interfaces, pages 20-27, 2003.

10
S. Hudson, J. Fogarty, C. Atkeson, J. Forlizzi, S. Kiesler, J. Lee, and J. Yang.
Predicting human interruptibility with sensors: A wizard of oz feasibility study.
Proceedings of CHI 2003, ACM Press, 2003.

11
A. Kulkarni.
A reactive behavioral system for the intelligent room.
Master's thesis, Massachusetts Institute of Technology, Cambridge, MA, 2002., 2002.

12
J. R. Quinlan.
C4.5: Programs for Machine Learning.
Morgan Kaufmann, 1992.

About this document ...

Context Learning Can Improve User Interaction

This document was generated using the LaTeX2HTML translator Version 2002-2-1 (1.71)

Copyright © 1993, 1994, 1995, 1996, Nikos Drakos, Computer Based Learning Unit, University of Leeds.
Copyright © 1997, 1998, 1999, Ross Moore, Mathematics Department, Macquarie University, Sydney.

The command line arguments were:
latex2html -split 1 iri04

The translation was initiated by Sushil Louis on 2005-01-06


next_inactive up previous
Sushil Louis 2005-01-06