Willkommen, schön sind Sie da!
Logo Ex Libris

Data Mining and Predictive Analytics

  • E-Book (pdf)
  • 824 Seiten
(0) Erste Bewertung abgeben
Bewertungen
(0)
(0)
(0)
(0)
(0)
Alle Bewertungen ansehen
Learn methods of data analysis and their application to real-world data sets This updated second edition serves as an introductio... Weiterlesen
E-Books ganz einfach mit der kostenlosen Ex Libris-Reader-App lesen. Hier erhalten Sie Ihren Download-Link.
CHF 75.00
Download steht sofort bereit
Informationen zu E-Books
E-Books eignen sich auch für mobile Geräte (sehen Sie dazu die Anleitungen).
E-Books von Ex Libris sind mit Adobe DRM kopiergeschützt: Erfahren Sie mehr.
Weitere Informationen finden Sie hier.

Beschreibung

Learn methods of data analysis and their application to real-world data sets

This updated second edition serves as an introduction to data mining methods and models, including association rules, clustering, neural networks, logistic regression, and multivariate analysis. The authors apply a unified “white box” approach to data mining methods and models. This approach is designed to walk readers through the operations and nuances of the various methods, using small data sets, so readers can gain an insight into the inner workings of the method under review. Chapters provide readers with hands-on analysis problems, representing an opportunity for readers to apply their newly-acquired data mining expertise to solving real problems using large, real-world data sets.

Data Mining and Predictive Analytics, Second Edition:

  • Offers comprehensive coverage of association rules, clustering, neural networks, logistic regression, multivariate analysis, and R statistical programming language
  • Features over 750 chapter exercises, allowing readers to assess their understanding of the new material
  • Provides a detailed case study that brings together the lessons learned in the book
  • Includes access to the companion website, www.dataminingconsultant, with exclusive password-protected instructor content

Data Mining and Predictive Analytics, Second Edition will appeal to computer science and statistic students, as well as students in MBA programs, and chief executives.



Daniel T. Larose is Professor of Mathematical Sciences and Director of the Data Mining programs at Central Connecticut State University. He has published several books, including Data Mining the Web: Uncovering Patterns in Web Content, Structure, and Usage (Wiley, 2007) and Discovering Knowledge in Data: An Introduction to Data Mining (Wiley, 2005). In addition to his scholarly work, Dr. Larose is a consultant in data mining and statistical analysis working with many high profile clients, including Microsoft, Forbes Magazine, the CIT Group, KPMG International, Computer Associates, and Deloitte, Inc.
Chantal D. Larose is a Ph.D. candidate in Statistics at the University of Connecticut. Her research focuses on the imputation of missing data and model-based clustering. She has taught undergraduate statistics since 2011, and is a statistical consultant for DataMiningConsultant.com, LLC.

Autorentext
Daniel T. Larose is Professor of Mathematical Sciences and Director of the Data Mining programs at Central Connecticut State University. He has published several books, including Data Mining the Web: Uncovering Patterns in Web Content, Structure, and Usage (Wiley, 2007) and Discovering Knowledge in Data: An Introduction to Data Mining (Wiley, 2005). In addition to his scholarly work, Dr. Larose is a consultant in data mining and statistical analysis working with many high profile clients, including Microsoft, Forbes Magazine, the CIT Group, KPMG International, Computer Associates, and Deloitte, Inc.

Chantal D. Larose is an Assistant Professor of Statistics & Data Science at Eastern Connecticut State University (ECSU). She has co-authored three books on data science and predictive analytics. She helped develop data science programs at ECSU and at SUNY New Paltz. She received her PhD in Statistics from the University of Connecticut, Storrs in 2015 (dissertation title: Model-based Clustering of Incomplete Data).

Inhalt

PREFACE xxi

ACKNOWLEDGMENTS xxix

PART I DATA PREPARATION 1

CHAPTER 1 AN INTRODUCTION TO DATA MINING AND PREDICTIVE ANALYTICS 3

1.1 What is Data Mining? What is Predictive Analytics? 3

1.2 Wanted: Data Miners 5

1.3 The Need for Human Direction of Data Mining 6

1.4 The Cross-Industry Standard Process for Data Mining: CRISP-DM 6

1.4.1 CRISP-DM: The Six Phases 7

1.5 Fallacies of Data Mining 9

1.6 What Tasks Can Data Mining Accomplish 10

CHAPTER 2 DATA PREPROCESSING 20

2.1 Why do We Need to Preprocess the Data? 20

2.2 Data Cleaning 21

2.3 Handling Missing Data 22

2.4 Identifying Misclassifications 25

2.5 Graphical Methods for Identifying Outliers 26

2.6 Measures of Center and Spread 27

2.7 Data Transformation 30

2.8 MinMax Normalization 30

2.9 Z-Score Standardization 31

2.10 Decimal Scaling 32

2.11 Transformations to Achieve Normality 32

2.12 Numerical Methods for Identifying Outliers 38

2.13 Flag Variables 39

2.14 Transforming Categorical Variables into Numerical Variables 40

2.15 Binning Numerical Variables 41

2.16 Reclassifying Categorical Variables 42

2.17 Adding an Index Field 43

2.18 Removing Variables that are not Useful 43

2.19 Variables that Should Probably not be Removed 43

2.20 Removal of Duplicate Records 44

2.21 A Word About ID Fields 45

CHAPTER 3 EXPLORATORY DATA ANALYSIS 54

3.1 Hypothesis Testing Versus Exploratory Data Analysis 54

3.2 Getting to Know the Data Set 54

3.3 Exploring Categorical Variables 56

3.4 Exploring Numeric Variables 64

3.5 Exploring Multivariate Relationships 69

3.6 Selecting Interesting Subsets of the Data for Further Investigation 70

3.7 Using EDA to Uncover Anomalous Fields 71

3.8 Binning Based on Predictive Value 72

3.9 Deriving New Variables: Flag Variables 75

3.10 Deriving New Variables: Numerical Variables 77

3.11 Using EDA to Investigate Correlated Predictor Variables 78

3.12 Summary of Our EDA 81

CHAPTER 4 DIMENSION-REDUCTION METHODS 92

4.1 Need for Dimension-Reduction in Data Mining 92

4.2 Principal Components Analysis 93

4.3 Applying PCA to the Houses Data Set 96

4.4 How Many Components Should We Extract? 102

4.5 Profiling the Principal Components 105

4.6 Communalities 108

4.7 Validation of the Principal Components 110

4.8 Factor Analysis 110

4.9 Applying Factor Analysis to the Adult Data Set 111

4.10 Factor Rotation 114

4.11 User-Defined Composites 117

4.12 An Example of a User-Defined Composite 118

PART II STATISTICAL ANALYSIS 129

CHAPTER 5 UNIVARIATE STATISTICAL ANALYSIS 131

5.1 Data Mining Tasks in Discovering Knowledge in Data 131

5.2 Statistical Approaches to Estimation and Prediction 131

5.3 Statistical Inference 132

5.4 How Confident are We in Our Estimates? 133

5.5 Confidence Interval Estimation of the Mean 134

5.6 How to Reduce the Margin of Error 136

5.7 Confidence Interval Estimation of the Proportion 137

5.8 Hypothesis Testing for the Mean 138

5.9 Assessing the Strength of Evidence Against the Null Hypothesis 140

5.10 Using Confidence Intervals to Perform Hypothesis Tests 141

5.11 Hypothesis Testing for the Proportion 143

CHAPTER 6 MULTIVARIATE STATISTICS 148

6.1 Two-Sample t-Test for Difference in Means 148

6.2 Two-Sample Z-Test for Difference in Proportions 149

6.3 Test for the Homogeneity of Proportions 150

6.4 Chi-Square Test for Goodness of Fit of Multinomial Data 152

6.5 Analysis of Variance 153

CHAPTER 7 PREPARING TO MODEL THE DATA 160

7.1 Supervised Versus Unsupervised Methods 160

7.2 Statistical Methodology and Data Mining Methodology 161

7.3 Cross-Validation 161

7.4 Overfitting 163

7.5 BiasVariance Trade-Off 164

7.6 Balancing the Training Data Set 166

7.7 Establishing Baseline Performance 167

CHAPTER 8 SIMPLE LINEAR REGRESSION 171

8.1 An Example of Simple Linear Regression 171

8.2 Dangers of Extrapolation 177

8.3 How Useful is the Regression? The Coefficient of Determination, r2 178

8.4 Standard Error of the Estimate, s 183

8.5 Correlation Coefficient r 184

8.6 Anova Table for Simple Linear Regression 186

8.7 Outliers, High Leverage Points, and Influential Observations 186

8.8 Population Regression Equation 195

8.9 Verifying the Regression Assumptions 198

8.10 Inference in Regression 203

8.11 t-Test for the Relationship Between x an...

Produktinformationen

Titel: Data Mining and Predictive Analytics
Autor:
EAN: 9781118868676
ISBN: 978-1-118-86867-6
Digitaler Kopierschutz: Adobe-DRM
Format: E-Book (pdf)
Herausgeber: Wiley
Genre: Informatik
Anzahl Seiten: 824
Veröffentlichung: 19.02.2015
Jahr: 2015
Auflage: 2. Aufl.
Untertitel: Englisch
Dateigrösse: 38.8 MB