Bayesian networks for data mining DAVID HECKERMAN heckerma@microsoft.com Microsoft Research, 9S, Redmond, WA 98052-6399 Editor: Usama Fayyad Received June 27, 1996; Revised November 5, 1996; Accepted Nevember 5, 1996 Abstract. A Bayesian network is a graphical model that encodes probabilistic relationships among variables of interest. When used in conjunction with statistical techniques, the graphical model has several advantages for data modeling. One, because the model encodes dependencies among all variables, it readily handles situations where some data entries are missing. Two, a Bayesian network can be used to learn causal relationships, and hence can be used to gain understanding about a problem domain and to predict the consequences of intervention. Three, because the model has both a causal and probabilistic semantics, it is an ideal representation for combining prior knowledge (which often comes in causal form) and data. Four, Bayesian statistical methods in conjunction with Bayesian networks offer an efficient and principled approach for avoiding the overfitting of data. In this paper, we discuss methods for constructing Bayesian networks from prior knowledge and summarize Bayesian statistical methods for using data to improve these models. With regard to the latter task, we describe methods for learning both the parameters and structure of a Bayesian network, including techniques for learning with incomplete data. In addition, we relate Bayesian-network methods for learning to techniques for supervised and unsupervised learning. We illustrate the graphical-modeling approach using a real-world case study. On Bias, Variance, 0/1Ñ Loss, and the Curse-of-Dimensionality JEROME H. FRIEDMAN Department of Statistics and Stanford Linear Accelerator Center, Stanford University Editor: Usama Fayyad Received June 3, 1996; Revised October 23, 1996; Accepted October 23, 1996 Abstract. The classification problem is considered in which an output variable y assumes discrete values with respective probabilities that depend upon the simultaneous values of a set of input variables x D f x1; : : : ; xn g : At issue is how error in the estimates of these probabilities affects classification error when the estimates are used in a classification rule. These effects are seen to be somewhat counter intuitive in both their strength and nature. In particular the bias and variance components of the estimation error combine to influence classification in a very different way than with squared error on the probabilities themselves. Certain types of (very high) bias can be canceled by low variance to produce accurate classification. This can dramatically mitigate the effect of the bias associated with some simple estimators like "naive" Bayes, and the bias induced by the curse-of-dimensionality on nearest-neighbor procedures. This helps explain why such simple methods are often competitive with and sometimes superior to more sophisticated ones for classification, and why "bagging/aggregating" classifiers can often improve accuracy. These results also suggest simple modifications to these procedures that can (sometimes dramatically) further improve their classification performance. Keywords: classification, bias, variance, curse-of-dimensionality, bagging, naive Bayes, nearest-neighbors Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals JIM GRAY Gray@Microsoft.com SURAJIT CHAUDHURI SurajitC@Microsoft.com ADAM BOSWORTH AdamB@Microsoft.com ANDREW LAYMAN AndrewL@Microsoft.com DON REICHART DonRei@Microsoft.com MURALI VENKATRAO MuraliV@Microsoft.com Microsoft Research, Advanced Technology Division, Microsoft Corporation, One Microsoft Way, Redmond, WA 98052 FRANK PELLOW Pellow@vnet.IBM.com HAMID PIRAHESH Pirahesh@Almaden.IBM.com IBM Research, 500 Harry Road, San Jose, CA 95120 Editor: Usama Fayyad Received July 2, 1996; Revised November 5, 1996; Accepted November 6, 1996 Abstract. Data analysis applications typically aggregate data across many dimensions looking for anomalies or unusual patterns. The SQL aggregate functions and the GROUP BY operator produce zero-dimensional or one-dimensional aggregates. Applications need the N-dimensional generalization of these operators. This paper defines that operator, called the data cube or simply cube. The cube operator generalizes the histogram, cross-tabulation, roll-up, drill-down, and sub-total constructs found in most report writers. The novelty is that cubes are relations. Consequently, the cube operator can be imbedded in more complex non-procedural data analysis programs. The cube operator treats each of the N aggregation attributes as a dimension of N-space. The aggregate of a particular set of attribute values is a point in this space. The set of points forms an N-dimensional cube. Super-aggregates are computed by aggregating the N-cube to lower dimensional spaces. This paper (1) explains the cube and roll-up operators, (2) shows how they fit in SQL, (3) explains how users can define new aggregate functions for cubes, and (4) discusses efficient techniques to compute the cube. Many of these features are being added to the SQL Standard. Keywords: data cube, data mining, aggregation, summarization, database, analysis, query Brief Application Description Advanced Scout: Data Mining and Knowledge Discovery in NBA Data INDERPAL BHANDARI isb@watson.ibm.com EDWARD COLET ecolet@watson.ibm.com JENNIFER PARKER jparker@watson.ibm.com ZACHARY PINES s5pines@watson.ibm.com RAJIV PRATAP c1rajiv@watson.ibm.com KRISHNAKUMAR RAMANUJAM c1kk@watson.ibm.com IBM T.J. Watson Research Center Editor: Gregory Piatetsky-Shapiro Received April 30, 1996; Revised August 15, 1996; Accepted August 15, 1996 Abstract. Advanced Scout is a PC-based data mining application used by National Basketball Association (NBA) coaching staffs to discover interesting patterns in basketball game data. We describe Advanced Scout software from the perspective of data mining and knowledge discovery. This paper highlights the pre-processing of raw data that the program performs, describes the data mining aspects of the software and how the interpretation of patterns supports the process of knowledge discovery. The underlying technique of attribute focusing as the basis of the algorithm is also described. The process of pattern interpretation is facilitated by allowing the user to relate patterns to video tape. Keywords: data mining, knowledge discovery, attribute focusing, basketball, NBA Statistical Themes and Lessons for Data Mining CLARK GLYMOUR cg09@andrew.cmu.edu Department of Cognitive Psychology, Carnegie Mellon University, Pittsburgh, PA 15213 DAVID MADIGAN madigan@stat.washington.edu Department of Statistics, Box 354322, University of Washington, Seattle, WA 98195 DARYL PREGIBON daryl@research.att.com Statistics Research, AT&T Laboratories, Murray Hill, NJ 07974 PADHRAIC SMYTH smyth@ics.uci.edu Information and Computer Science, University of California, Irvine, CA 92717 Editor: Usama Fayyad Received June 27, 1996; Revised October 28, 1996; Accepted October 28, 1996 Abstract. Data mining is on the interface of Computer Science and Statistics, utilizing advances in both disci-plines to make progress in extracting information from large databases. It is an emerging field that has attracted much attention in a very short period of time. This article highlights some statistical themes and lessons that are directly relevant to data mining and attempts to identify opportunities where close cooperation between the statistical and computational communities might reasonably provide synergy for further progress in data analysis. Keywords: statistics, uncertainty, modeling, bias, variance Personal WebWatcher: Implementation and Design WebWatcher: A Learning Apprentice for the World Wide Web WebWatcher: Knowledge Navigation in the World Wide Web All from CMU (Tom Mitchel's page) A Modular Architecture for Office Delivery Robots, Reid Simmons, Richard Goodwin, Karen Zita Haigh, Sven Koenig, Joseph O'Sullivan. in Autonomous Agents 1997. February 1997. ACM. Pages 245-252. Learning Decision Trees for Mapping the Local Environment in Mobile Robot Navigation Ian Sillitoe (Ludburough Univ. of Technology, UK) Tapio Elomaa (University of Helsinki, Finland) Active Exploration Based ID-3 Learning for Robot Grasping M. Salconigoff, Len G. Kunin, Lyle H. Ungar Univ. of Pennsylvania. Learning rules that classify email William Cohen, AT&T A neural network pole balancer that learns and operates on a real robot in real time. University of Minnesota An evolutionary approach to Learning in Robots John Greffenstette and Alan Shultz Navy Center for AI (NCAI)