CHF47.00
Download steht sofort bereit
Data Science and Big Data Analytics is about harnessing the power of data for new insights. The book covers the breadth of activities and methods and tools that Data Scientists use. The content focuses on concepts, principles and practical applications that are applicable to any industry and technology environment, and the learning is supported and explained with examples that you can replicate using open-source software.
This book will help you:
Become a contributor on a data science team
Deploy a structured lifecycle approach to data analytics problems
Apply appropriate analytic techniques and tools to analyzing big data
Learn how to tell a compelling story with data to drive business action
Prepare for EMC Proven Professional Data Science Certification
Get started discovering, analyzing, visualizing, and presenting data in a meaningful way today!
Autorentext
EMC is a global leader in enabling businesses and service providers to transform their operations and deliver IT as a service. Fundamental to this transformation is cloud computing. Through innovative products and services, EMC accelerates the journey to cloud computing, helping IT departments to store, manage, protect and analyze their most valuable asset — information — in a more agile, trusted and cost-efficient way. Additional information about EMC can be found at www.EMC.com
Zusammenfassung
Data Science and Big Data Analytics is about harnessing the power of data for new insights. The book covers the breadth of activities and methods and tools that Data Scientists use. The content focuses on concepts, principles and practical applications that are applicable to any industry and technology environment, and the learning is supported and explained with examples that you can replicate using open-source software. This book will help you:
Inhalt
Introduction xvii
Chapter 1 Introduction to Big Data Analytics 1
1.1 Big Data Overview 2
1.1.1 Data Structures 5
1.1.2 Analyst Perspective on Data Repositories 9
1.2 State of the Practice in Analytics 11
1.2.1 BI Versus Data Science 12
1.2.2 Current Analytical Architecture 13
1.2.3 Drivers of Big Data 15
1.2.4 Emerging Big Data Ecosystem and a New Approach to Analytics 16
1.3 Key Roles for the New Big Data Ecosystem 19
1.4 Examples of Big Data Analytics 22
Summary 23
Exercises 23
Bibliography 24
Chapter 2 Data Analytics Lifecycle 25
2.1 Data Analytics Lifecycle Overview 26
2.1.1 Key Roles for a Successful Analytics Project 26
2.1.2 Background and Overview of Data Analytics Lifecycle 28
2.2 Phase 1: Discovery 30
2.2.1 Learning the Business Domain 30
2.2.2 Resources 31
2.2.3 Framing the Problem 32
2.2.4 Identifying Key Stakeholders 33
2.2.5 Interviewing the Analytics Sponsor 33
2.2.6 Developing Initial Hypotheses 35
2.2.7 Identifying Potential Data Sources 35
2.3 Phase 2: Data Preparation 36
2.3.1 Preparing the Analytic Sandbox 37
2.3.2 Performing ETLT 38
2.3.3 Learning About the Data 39
2.3.4 Data Conditioning 40
2.3.5 Survey and Visualize 41
2.3.6 Common Tools for the Data Preparation Phase 42
2.4 Phase 3: Model Planning 42
2.4.1 Data Exploration and Variable Selection 44
2.4.2 Model Selection 45
2.4.3 Common Tools for the Model Planning Phase 45
2.5 Phase 4: Model Building 46
2.5.1 Common Tools for the Model Building Phase 48
2.6 Phase 5: Communicate Results 49
2.7 Phase 6: Operationalize 50
2.8 Case Study: Global Innovation Network and Analysis (GINA) 53
2.8.1 Phase 1: Discovery 54
2.8.2 Phase 2: Data Preparation 55
2.8.3 Phase 3: Model Planning 56
2.8.4 Phase 4: Model Building 56
2.8.5 Phase 5: Communicate Results 58
2.8.6 Phase 6: Operationalize 59
Summary 60
Exercises 61
Bibliography 61
Chapter 3 Review of Basic Data Analytic Methods Using R 63
3.1 Introduction to R 64
3.1.1 R Graphical User Interfaces 67
3.1.2 Data Import and Export 69
3.1.3 Attribute and Data Types 71
3.1.4 Descriptive Statistics 79
3.2 Exploratory Data Analysis 80
3.2.1 Visualization Before Analysis 82
3.2.2 Dirty Data 85
3.2.3 Visualizing a Single Variable 88
3.2.4 Examining Multiple Variables 91
3.2.5 Data Exploration Versus Presentation 99
3.3 Statistical Methods for Evaluation 101
3.3.1 Hypothesis Testing 102
3.3.2 Difference of Means 104
3.3.3 Wilcoxon Rank-Sum Test 108
3.3.4 Type I and Type II Errors 109
3.3.5 Power and Sample Size 110
3.3.6 ANOVA 110
Summary 114
Exercises 114
Bibliography 115
Chapter 4 Advanced Analytical Theory and Methods: Clustering 117
4.1 Overview of Clustering 118
4.2 K-means 118
4.2.1 Use Cases 119
4.2.2 Overview of the Method 120
4.2.3 Determining the Number of Clusters 123
4.2.4 Diagnostics 128
4.2.5 Reasons to Choose and Cautions 130
4.3 Additional Algorithms 134
Summary 135
Exercises 135
Bibliography 136
Chapter 5 Advanced Analytical Theory and Methods: Association Rules 137
5.1 Overview 138
5.2 Apriori Algorithm 140
5.3 Evaluation of Candidate Rules 141
5.4 Applications of Association Rules 143
5.5 An Example: Transactions in a Grocery Store 143 5.5.1 The Groceries Dataset 144&l...