Bayesian networks for data mining

	DAVID HECKERMAN heckerma@microsoft.com 
	Microsoft Research, 9S, Redmond, WA 98052-6399 
	Editor: Usama Fayyad 
	Received June 27, 1996; 
	Revised November 5, 1996; Accepted Nevember 5, 1996

	Abstract. A Bayesian network is a graphical model that encodes
probabilistic relationships among variables of interest. When used in
conjunction with statistical techniques, the graphical model has
several advantages for data modeling. One, because the model encodes
dependencies among all variables, it readily handles situations where
some data entries are missing. Two, a Bayesian network can be used to
learn causal relationships, and hence can be used to gain
understanding about a problem domain and to predict the consequences
of intervention. Three, because the model has both a causal and
probabilistic semantics, it is an ideal representation for combining
prior knowledge (which often comes in causal form) and data. Four,
Bayesian statistical methods in conjunction with Bayesian networks
offer an efficient and principled approach for avoiding the
overfitting of data. In this paper, we discuss methods for
constructing Bayesian networks from prior knowledge and summarize
Bayesian statistical methods for using data to improve these
models. With regard to the latter task, we describe methods for
learning both the parameters and structure of a Bayesian network,
including techniques for learning with incomplete data. In addition,
we relate Bayesian-network methods for learning to techniques for
supervised and unsupervised learning. We illustrate the
graphical-modeling approach using a real-world case study.


On Bias, Variance, 0/1Ñ Loss, and the Curse-of-Dimensionality 

	JEROME H. FRIEDMAN 
	Department of Statistics and Stanford Linear Accelerator Center, 
	Stanford University 
	Editor: Usama Fayyad Received June 3, 1996; 
	Revised October 23, 1996; Accepted October 23, 1996

Abstract. The classification problem is considered in which an output
variable y assumes discrete values with respective probabilities that
depend upon the simultaneous values of a set of input variables x D f
x1; : : : ; xn g : At issue is how error in the estimates of these
probabilities affects classification error when the estimates are used
in a classification rule. These effects are seen to be somewhat
counter intuitive in both their strength and nature. In particular the
bias and variance components of the estimation error combine to
influence classification in a very different way than with squared
error on the probabilities themselves. Certain types of (very high)
bias can be canceled by low variance to produce accurate
classification. This can dramatically mitigate the effect of the bias
associated with some simple estimators like "naive" Bayes, and the
bias induced by the curse-of-dimensionality on nearest-neighbor
procedures. This helps explain why such simple methods are often
competitive with and sometimes superior to more sophisticated ones for
classification, and why "bagging/aggregating" classifiers can often
improve accuracy. These results also suggest simple modifications to
these procedures that can (sometimes dramatically) further improve
their classification performance. Keywords: classification, bias,
variance, curse-of-dimensionality, bagging, naive Bayes,
nearest-neighbors


Data Cube: A Relational Aggregation Operator Generalizing Group-By,
Cross-Tab, and Sub-Totals 
	JIM GRAY Gray@Microsoft.com 
	SURAJIT CHAUDHURI SurajitC@Microsoft.com 
	ADAM BOSWORTH AdamB@Microsoft.com
	ANDREW LAYMAN AndrewL@Microsoft.com 
	DON REICHART DonRei@Microsoft.com
	MURALI VENKATRAO MuraliV@Microsoft.com Microsoft Research, 
	Advanced Technology Division, Microsoft Corporation, 
	One Microsoft Way, Redmond, WA 98052 
	FRANK PELLOW Pellow@vnet.IBM.com 
	HAMID PIRAHESH Pirahesh@Almaden.IBM.com IBM Research, 
	500 Harry Road, San Jose, CA 95120 

	Editor: Usama Fayyad 
	Received July 2, 1996; 
	Revised November 5, 1996; Accepted November 6, 1996 

	Abstract. Data analysis applications typically aggregate data
across many dimensions looking for anomalies or unusual patterns. The
SQL aggregate functions and the GROUP BY operator produce
zero-dimensional or one-dimensional aggregates. Applications need the
N-dimensional generalization of these operators. This paper defines
that operator, called the data cube or simply cube. The cube operator
generalizes the histogram, cross-tabulation, roll-up, drill-down, and
sub-total constructs found in most report writers. The novelty is that
cubes are relations. Consequently, the cube operator can be imbedded
in more complex non-procedural data analysis programs. The cube
operator treats each of the N aggregation attributes as a dimension of
N-space. The aggregate of a particular set of attribute values is a
point in this space. The set of points forms an N-dimensional
cube. Super-aggregates are computed by aggregating the N-cube to lower
dimensional spaces. This paper (1) explains the cube and roll-up
operators, (2) shows how they fit in SQL, (3) explains how users can
define new aggregate functions for cubes, and (4) discusses efficient
techniques to compute the cube. Many of these features are being added
to the SQL Standard. Keywords: data cube, data mining, aggregation,
summarization, database, analysis, query
 

Brief Application Description 
Advanced Scout: Data Mining and Knowledge Discovery in NBA Data 
	INDERPAL BHANDARI isb@watson.ibm.com 
	EDWARD COLET ecolet@watson.ibm.com 
	JENNIFER PARKER jparker@watson.ibm.com 
	ZACHARY PINES s5pines@watson.ibm.com 
	RAJIV PRATAP c1rajiv@watson.ibm.com 
	KRISHNAKUMAR RAMANUJAM c1kk@watson.ibm.com 
	IBM T.J. Watson Research Center 

	Editor: Gregory Piatetsky-Shapiro 
	Received April 30, 1996; 
	Revised August 15, 1996; Accepted August 15, 1996 

Abstract. Advanced Scout is a PC-based data mining application used by
National Basketball Association (NBA) coaching staffs to discover
interesting patterns in basketball game data. We describe Advanced
Scout software from the perspective of data mining and knowledge
discovery. This paper highlights the pre-processing of raw data that
the program performs, describes the data mining aspects of the
software and how the interpretation of patterns supports the process
of knowledge discovery. The underlying technique of attribute focusing
as the basis of the algorithm is also described. The process of
pattern interpretation is facilitated by allowing the user to relate
patterns to video tape. Keywords: data mining, knowledge discovery,
attribute focusing, basketball, NBA

Statistical Themes and Lessons for Data Mining 
	CLARK GLYMOUR cg09@andrew.cmu.edu 
	Department of Cognitive Psychology, 
	Carnegie Mellon University, Pittsburgh, PA 15213 

	DAVID MADIGAN madigan@stat.washington.edu 
	Department of Statistics, Box 354322, 
	University of Washington, Seattle, WA 98195 

	DARYL PREGIBON daryl@research.att.com 
	Statistics Research, AT&T Laboratories, Murray Hill, NJ 07974 

	PADHRAIC SMYTH smyth@ics.uci.edu 
	Information and Computer Science, 
	University of California, Irvine, CA 92717 

	Editor: Usama Fayyad 
	Received June 27, 1996; 
	Revised October 28, 1996; Accepted October 28, 1996

Abstract. Data mining is on the interface of Computer Science and
Statistics, utilizing advances in both disci-plines to make progress
in extracting information from large databases. It is an emerging
field that has attracted much attention in a very short period of
time. This article highlights some statistical themes and lessons that
are directly relevant to data mining and attempts to identify
opportunities where close cooperation between the statistical and
computational communities might reasonably provide synergy for further
progress in data analysis. Keywords: statistics, uncertainty,
modeling, bias, variance


Personal WebWatcher: Implementation and Design 

WebWatcher: A Learning Apprentice for the World Wide Web 

WebWatcher: Knowledge Navigation in the World Wide Web

All from CMU (Tom Mitchel's page)


A Modular Architecture for Office Delivery Robots,
	Reid Simmons, Richard Goodwin, Karen Zita Haigh, Sven Koenig, Joseph
	O'Sullivan.  in Autonomous Agents 1997. February 1997. ACM. Pages
	245-252. 


Learning Decision Trees for Mapping the Local Environment
in Mobile Robot Navigation

	Ian Sillitoe (Ludburough Univ. of Technology, UK)
	Tapio Elomaa (University of Helsinki, Finland)

Active Exploration Based ID-3 Learning for Robot Grasping

	M. Salconigoff, Len G. Kunin, Lyle H. Ungar
	Univ. of Pennsylvania.

Learning rules that classify email
		William Cohen, AT&T

A neural network pole balancer that learns and operates on a real robot in real
time.  

	University of Minnesota

An evolutionary approach to Learning in Robots

	John Greffenstette and Alan Shultz
	Navy Center for AI (NCAI)