CHF34.00
Download est disponible immédiatement
Construct a robust end-to-end solution for analyzing and
visualizing streaming data
Real-time analytics is the hottest topic in data analytics
today. In Real-Time Analytics: Techniques to Analyze and
Visualize Streaming Data, expert Byron Ellis teaches data
analysts technologies to build an effective real-time analytics
platform. This platform can then be used to make sense of the
constantly changing data that is beginning to outpace traditional
batch-based analysis platforms.
The author is among a very few leading experts in the field. He
has a prestigious background in research, development, analytics,
real-time visualization, and Big Data streaming and is uniquely
qualified to help you explore this revolutionary field. Moving from
a description of the overall analytic architecture of real-time
analytics to using specific tools to obtain targeted results,
Real-Time Analytics leverages open source and modern
commercial tools to construct robust, efficient systems that can
provide real-time analysis in a cost-effective manner. The book
includes:
A deep discussion of streaming data systems and
architectures
Instructions for analyzing, storing, and delivering streaming
data
Tips on aggregating data and working with sets
Information on data warehousing options and techniques
Real-Time Analytics includes in-depth case studies for
website analytics, Big Data, visualizing streaming and mobile data,
and mining and visualizing operational data flows. The book's
"recipe" layout lets readers quickly learn and implement different
techniques. All of the code examples presented in the book, along
with their related data sets, are available on the companion
website.
Auteur
BYRON ELLIS is CTO of Spongecell, where he heads research and development. Previously the Chief Data Scientist for LivePerson and CTO at AdBrite, Ellis holds a Ph.D. in Statistics from Harvard University, and a B.S. in Cybernetics from UCLA. He presents sessions on real-time analytics at Strata and other major
conferences.
Résumé
Construct a robust end-to-end solution for analyzing and visualizing streaming data
Real-time analytics is the hottest topic in data analytics today. In Real-Time Analytics: Techniques to Analyze and Visualize Streaming Data, expert Byron Ellis teaches data analysts technologies to build an effective real-time analytics platform. This platform can then be used to make sense of the constantly changing data that is beginning to outpace traditional batch-based analysis platforms.
The author is among a very few leading experts in the field. He has a prestigious background in research, development, analytics, real-time visualization, and Big Data streaming and is uniquely qualified to help you explore this revolutionary field. Moving from a description of the overall analytic architecture of real-time analytics to using specific tools to obtain targeted results, Real-Time Analytics leverages open source and modern commercial tools to construct robust, efficient systems that can provide real-time analysis in a cost-effective manner. The book includes:
Contenu
Introduction xv
Chapter 1 Introduction to Streaming Data 1
Sources of Streaming Data 2
Operational Monitoring 3
Web Analytics 3
Online Advertising 4
Social Media 5
Mobile Data and the Internet of Things 5
Why Streaming Data Is Different 7
Always On, Always Flowing 7
Loosely Structured 8
High-Cardinality Storage 9
Infrastructures and Algorithms 10
Conclusion 10
Part I Streaming Analytics Architecture 13
Chapter 2 Designing Real-Time Streaming Architectures 15
Real-Time Architecture Components 16
Collection 16
Data Flow 17
Processing 19
Storage 20
Delivery 22
Features of a Real-Time Architecture 24
High Availability 24
Low Latency 25
Horizontal Scalability 26
Languages for Real-Time Programming 27
Java 27
Scala and Clojure 28
JavaScript 29
The Go Language 30
A Real-Time Architecture Checklist 30
Collection 31
Data Flow 31
Processing 32
Storage 32
Delivery 33
Conclusion 34
Chapter 3 Service Configuration and Coordination 35
Motivation for Confi guration and Coordination Systems 36
Maintaining Distributed State 36
Unreliable Network Connections 36
Clock Synchronization 37
Consensus in an Unreliable World 38
Apache ZooKeeper 39
The znode 39
Watches and Notifi cations 41
Maintaining Consistency 41
Creating a ZooKeeper Cluster 42
ZooKeeper's Native Java Client 47
The Curator Client 56
Curator Recipes 63
Conclusion 70
Chapter 4 Data-Flow Management in Streaming Analysis 71
Distributed Data Flows 72
At Least Once Delivery 72
The n+1 Problem 73
Apache Kafka: High-Throughput Distributed Messaging 74
Design and Implementation 74
Configuring a Kafka Environment 80
Interacting with Kafka Brokers 89
Apache Flume: Distributed Log Collection 92
The Flume Agent 92
Configuring the Agent 94
The Flume Data Model 95
Channel Selectors 95
Flume Sources 98
Flume Sinks 107
Sink Processors 110
Flume Channels 110
Flume Interceptors 112
Integrating Custom Flume Components 114
Running Flume Agents 114
Conclusion 115
Chapter 5 Processing Streaming Data 117
Distributed Streaming Data Processing 118
Coordination 118
Partitions and Merges 119
Transactions 119
Processing Data with Storm 119
Components of a Storm Cluster 120
Configuring a Storm Cluster 122
Distributed Clusters 123
Local Clusters 126
Storm Topologies 127
Implementing Bolts 130
Implementing and Using Spouts 136
Distributed Remote Procedure Calls 142
Trident: The Storm DSL 144
Processing Data with Samza 151
Apache YARN 151
Getting Started with YARN and Samza 153
Integrating Samza into the Data Flow 157
Samza Jobs 157
Conclusion 166
Chapter 6 Storing Streaming Data 167
Consistent Hashing 168
NoSQL Storage Systems 169
Redis 170
MongoDB 180
Cassandra 203
Other Storage Technologies 215
Relational Databases 215
Distributed In-Memory Data Grids 215
Choosing a Technology 215
Key-Value Stores 216
Document Stores 216 Distributed ...