Scheduled for Measurement: Digging the Gold (Knowledge) from the Data Mine, Friday, April 12, 2002, 10:15 AM - 12:15 PM, San Diego Convention Center: Room 7B


Data Mining and Knowledge Discovery in Databases: An Overview

Weimo Zhu, Mahomet, IL

With a better understanding of the patterns among variables or attributes in a database, we will be able to make better predictions and classifications, explain existing data, summarize the contents to support decision making, and visualize the data to help people understand the patterns easily. However, due to the huge volume of data in modern databases, finding meaningful patterns among variables has become very challenging. Knowledge discovery in databases (KDD) is a non-trivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data. Data mining is a very important step in KDD, consisting of particular algorithms (methods) which, under some acceptable objectives, produce a particular enumeration of patterns (models) over the data. While many algorithms in data mining have been used in statistics (e.g., classification and clustering) and machine learning (e.g., fuzzy logic) for some time, their combined strengths have just recently been recognized, due largely to the revolution in computer, Internet, and information technology. Large-volume data, which could make the results of traditional statistical tests "significant" easily, and new types of data (e.g., images) have forced us to reexamine how the data in modern databases should be analyzed. This presentation will provide an overview of advances in KDD and data mining. After briefly describing the history and some key concepts of KDD and data mining, major steps of KDD will be reviewed in detail, including: (a) Develop an understanding of the application domain; (b) Select the data mining task; (c) Select appropriate data mining approaches; (c) Mine the data to extract patterns or models; (d) Interpret and evaluate patterns/models; and (e) Consolidate discovered knowledge. Also, major methods of data mining will be reviewed, including: (a) predictive modeling (classification and regression); (b) clustering, summarization (relationship, associations, information visualization); and (c) change and deviation detection, and (d) dependency modeling (graphical models). In addition, major data-mining tools will be reviewed and key references and on-line documentation will be introduced. Finally, major challenges in KDD and data mining, the issues of information privacy, and "hot" research areas in KDD and data mining, such as web data mining, multi-media data mining, and text data mining, will be described.

Back to the 2002 AAHPERD National Convention and Exposition