This updated second edition serves as an introduction to data mining methods and models, including association rules, clustering, neural networks, logistic regression, and multivariate analysis. The authors apply a unified "white box" approach to data mining methods and models. This approach is designed to walk readers through the operations and nuances of the various methods, using small data sets, so readers can gain an insight into the inner workings of the method under review. Chapters provide readers with hands-on analysis problems, representing an opportunity for readers to apply their newly-acquired data mining expertise to solving real problems using large, real-world data sets.

Data Mining and Predictive Analytics:

Offers comprehensive coverage of association rules, clustering, neural networks, logistic regression, multivariate analysis, and R statistical programming language
Features over 750 chapter exercises, allowing readers to assess their understanding of the new material
Provides a detailed case study that brings together the lessons learned in the book
Includes access to the companion website, www.dataminingconsultant, with exclusive password-protected instructor content

Data Mining and Predictive Analytics will appeal to computer science and statistic students, as well as students in MBA programs, and chief executives.

Autorentext
Daniel T. Larose is Professor of Mathematical Sciences and Director of the Data Mining programs at Central Connecticut State University. He has published several books, including Data Mining the Web: Uncovering Patterns in Web Content, Structure, and Usage (Wiley, 2007) and Discovering Knowledge in Data: An Introduction to Data Mining (Wiley, 2005). In addition to his scholarly work, Dr. Larose is a consultant in data mining and statistical analysis working with many high profile clients, including Microsoft, Forbes Magazine, the CIT Group, KPMG International, Computer Associates, and Deloitte, Inc.

Chantal D. Larose is an Assistant Professor of Statistics & Data Science at Eastern Connecticut State University (ECSU). She has co-authored three books on data science and predictive analytics. She helped develop data science programs at ECSU and at SUNY New Paltz. She received her PhD in Statistics from the University of Connecticut, Storrs in 2015 (dissertation title: Model-based Clustering of Incomplete Data).

Inhalt

PREFACE xxi

ACKNOWLEDGMENTS xxix

PART I DATA PREPARATION 1

CHAPTER 1 AN INTRODUCTION TO DATA MINING AND PREDICTIVE ANALYTICS 3

1.1 What is Data Mining? What is Predictive Analytics? 3

1.2 Wanted: Data Miners 5

1.3 The Need for Human Direction of Data Mining 6

1.4 The Cross-Industry Standard Process for Data Mining: CRISP-DM 6

1.4.1 CRISP-DM: The Six Phases 7

1.5 Fallacies of Data Mining 9

1.6 What Tasks Can Data Mining Accomplish 10

CHAPTER 2 DATA PREPROCESSING 20

2.1 Why do We Need to Preprocess the Data? 20

2.2 Data Cleaning 21

2.3 Handling Missing Data 22

2.4 Identifying Misclassifications 25

2.5 Graphical Methods for Identifying Outliers 26

2.6 Measures of Center and Spread 27

2.7 Data Transformation 30

2.8 MinMax Normalization 30

2.9 Z-Score Standardization 31

2.10 Decimal Scaling 32

2.11 Transformations to Achieve Normality 32

2.12 Numerical Methods for Identifying Outliers 38

2.13 Flag Variables 39

2.14 Transforming Categorical Variables into Numerical Variables 40

2.15 Binning Numerical Variables 41

2.16 Reclassifying Categorical Variables 42

2.17 Adding an Index Field 43

2.18 Removing Variables that are not Useful 43

2.19 Variables that Should Probably not be Removed 43

2.20 Removal of Duplicate Records 44

2.21 A Word About ID Fields 45

CHAPTER 3 EXPLORATORY DATA ANALYSIS 54

3.1 Hypothesis Testing Versus Exploratory Data Analysis 54

3.2 Getting to Know the Data Set 54

3.3 Exploring Categorical Variables 56

3.4 Exploring Numeric Variables 64

3.5 Exploring Multivariate Relationships 69

3.6 Selecting Interesting Subsets of the Data for Further Investigation 70

3.7 Using EDA to Uncover Anomalous Fields 71

3.8 Binning Based on Predictive Value 72

3.9 Deriving New Variables: Flag Variables 75

3.10 Deriving New Variables: Numerical Variables 77

3.11 Using EDA to Investigate Correlated Predictor Variables 78

3.12 Summary of Our EDA 81

CHAPTER 4 DIMENSION-REDUCTION METHODS 92

4.1 Need for Dimension-Reduction in Data Mining 92

4.2 Principal Components Analysis 93

4.3 Applying PCA to the Houses Data Set 96

4.4 How Many Components Should We Extract? 102

4.5 Profiling the Principal Components 105

4.6 Communalities 108

4.7 Validation of the Principal Components 110

4.8 Factor Analysis 110

4.9 Applying Factor Analysis to the Adult Data Set 111

4.10 Factor Rotation 114

4.11 User-Defined Composites 117

4.12 An Example of a User-Defined Composite 118

PART II STATISTICAL ANALYSIS 129

CHAPTER 5 UNIVARIATE STATISTICAL ANALYSIS 131

5.1 Data Mining Tasks in Discovering Knowledge in Data 131

5.2 Statistical Approaches to Estimation and Prediction 131

5.3 Statistical Inference 132

5.4 How Confident are We in Our Estimates? 133

5.5 Confidence Interval Estimation of the Mean 134

5.6 How to Reduce the Margin of Error 136

5.7 Confidence Interval Estimation of the Proportion 137

5.8 Hypothesis Testing for the Mean 138

5.9 Assessing the Strength of Evidence Against the Null Hypothesis 140

5.10 Using Confidence Intervals to Perform Hypothesis Tests 141

5.11 Hypothesis Testing for the Proportion 143

CHAPTER 6 MULTIVARIATE STATISTICS 148

6.1 Two-Sample t-Test for Difference in Means 148

6.2 Two-Sample Z-Test for Difference in Proportions 149

6.3 Test for the Homogeneity of Proportions 150

6.4 Chi-Square Test for Goodness of Fit of Multinomial Data 152 ...

Produktinformationen

Titel:

Data Mining and Predictive Analytics

Autor:

Daniel T. Larose

EAN:

9781118868676

ISBN:

978-1-118-86867-6

Format:

E-Book (pdf)

Hersteller:

Wiley

Herausgeber:

Wiley

Genre:

Informatik

Veröffentlichung:

19.02.2015

Digitaler Kopierschutz:

Adobe-DRM

Dateigrösse:

38.84 MB

Anzahl Seiten:

824

Jahr:

2015

Untertitel:

Englisch

Auflage:

2. Aufl.

Mehr entdecken: Informatik, EDV

Allgemeines, Lexika, Anwendungs-Software, Betriebssysteme, Benutzeroberflächen, Datenkommunikation, Netzwerke, Hardware, Informatik, Internet, Programmiersprachen, Sonstiges

Data Mining and Predictive Analytics

Wird oft zusammen gekauft

Andere Kunden kauften auch

Beschreibung

Produktinformationen

Mehr entdecken: Informatik, EDV