Data warehousing in the age of big data / Krish Krishnan.

By:

Krishnan, Krish

Material type: Text

TextSeries: Morgan Kaufmann Series on Business IntelligencePublisher: Amsterdam : Morgan Kaufmann is an imprint of Elsevier, 2013Description: 1 online resourceContent type:

text

Media type:

computer

Carrier type:

online resource

ISBN:

0124059201
9780124059207
1299591914
9781299591912

Subject(s):

Additional physical formats: Print version:: No titleDDC classification:

005.74/5 23

LOC classification:

QA76.9.D37 K75 2013eb

Online resources:

EBSCOhost

Contents:

Front Cover -- Data Warehousing in the Age of Big Data -- Copyright Page -- Contents -- Acknowledgments -- About the Author -- Introduction -- Part 1: Big Data -- Part 2: The Data Warehousing -- Part 3: Building the Big Data -- Data Warehouse -- Appendixes -- Companion website -- 1 BIG DATA -- 1 Introduction to Big Data -- Introduction -- Big Data -- Defining Big Data -- Why Big Data and why now? -- Big Data example -- Social Media posts -- Survey data analysis -- Survey data -- Weather data -- Twitter data -- Integration and analysis -- Additional data types -- Summary -- Further reading.

2 Working with Big Data -- Introduction -- Data explosion -- Data volume -- Machine data -- Application log -- Clickstream logs -- External or third-party data -- Emails -- Contracts -- Geographic information systems and geo-spatial data -- Example: Funshots, Inc. -- Data velocity -- Amazon, Facebook, Yahoo, and Google -- Sensor data -- Mobile networks -- Social media -- Data variety -- Summary -- 3 Big Data Processing Architectures -- Introduction -- Data processing revisited -- Data processing techniques -- Data processing infrastructure challenges -- Storage -- Transportation -- Processing.

Journal -- Checkpoint -- HDFS startup -- Block allocation and storage in HDFS -- HDFS client -- Replication and recovery -- Communication and management -- Heartbeats -- CheckpointNode and BackupNode -- CheckpointNode -- BackupNode -- File system snapshots -- JobTracker and TaskTracker -- MapReduce -- MapReduce programming model -- MapReduce program design -- MapReduce implementation architecture -- MapReduce job processing and management -- MapReduce limitations (Version 1, Hadoop MapReduce) -- MapReduce v2 (YARN) -- YARN scalability -- Comparison between MapReduce v1 and v2 -- SQL/MapReduce.

Speed or throughput -- Shared-everything and shared-nothing architectures -- Shared-everything architecture -- Shared-nothing architecture -- OLTP versus data warehousing -- Big Data processing -- Infrastructure explained -- Data processing explained -- Telco Big Data study -- Infrastructure -- Data processing -- 4 Introducing Big Data Technologies -- Introduction -- Distributed data processing -- Big Data processing requirements -- Technologies for Big Data processing -- Google file system -- Hadoop -- Hadoop core components -- HDFS -- HDFS architecture -- NameNode -- DataNodes -- Image.

Zookeeper -- Zookeeper features -- Locks and processing -- Failure and recovery -- Pig -- Programming with pig latin -- Pig data types -- Running pig programs -- Pig program flow -- Common pig command -- HBase -- HBase architecture -- HBase components -- Write-ahead log -- Hive -- Hive architecture -- Infrastructure -- Execution: how does hive process queries? -- Hive data types -- Hive query language (HiveQL) -- Chukwa -- Flume -- Oozie -- HCatalog -- Sqoop -- Sqoop1 -- Sqoop2 -- Hadoop summary -- NoSQL -- CAP theorem -- Key-value pair: Voldemort -- Column family store: Cassandra -- Data model.

Summary: "In conclusion as you come to the end of this book, the concept of a Data Warehouse and its primary goal of serving the enterprise version of truth, and being the single platform for all the source of information will continue to remain intact and valid for many years to come. As we have discussed across many chapters and in many case studies, the limitations that existed with the infrastructures to create, manage and deploy Data Warehouses have been largely eliminated with the availability of Big Data technologies and infrastructure platforms, making the goal of the single version of truth a feasible reality. Integrating and extending Big Data into the Data Warehouse, and creating a larger decision support platform will benefit businesses for years to come. This book has touched upon governance and information lifecycle management aspects of Big Data in the larger program, however you can reuse all the current program management techniques that you follow for the Data Warehouse for this program and even implement agile approaches to integrating and managing data in the Data Warehouse. Technologies will continue to evolve in this spectrum and there will be more additions of solutions, which can be integrated if you follow the modular integration approaches to building and managing the Data Warehouse. The Appendix sections contain many more case studies and a special section on Healthcare Information Factory based on Big Data approaches. These are more guiding posts to help you align your thoughts and goals to building and integrating Big Data in your Data Warehouse"-- Provided by publisher.

Holdings ( 1 )
Title notes ( 11 )

Holdings
Item type	Current library	Collection	Call number	Status	Date due	Barcode	Item holds
eBook	e-Library	EBSCO Computers		Available

Total holds: 0

"In conclusion as you come to the end of this book, the concept of a Data Warehouse and its primary goal of serving the enterprise version of truth, and being the single platform for all the source of information will continue to remain intact and valid for many years to come. As we have discussed across many chapters and in many case studies, the limitations that existed with the infrastructures to create, manage and deploy Data Warehouses have been largely eliminated with the availability of Big Data technologies and infrastructure platforms, making the goal of the single version of truth a feasible reality. Integrating and extending Big Data into the Data Warehouse, and creating a larger decision support platform will benefit businesses for years to come. This book has touched upon governance and information lifecycle management aspects of Big Data in the larger program, however you can reuse all the current program management techniques that you follow for the Data Warehouse for this program and even implement agile approaches to integrating and managing data in the Data Warehouse. Technologies will continue to evolve in this spectrum and there will be more additions of solutions, which can be integrated if you follow the modular integration approaches to building and managing the Data Warehouse. The Appendix sections contain many more case studies and a special section on Healthcare Information Factory based on Big Data approaches. These are more guiding posts to help you align your thoughts and goals to building and integrating Big Data in your Data Warehouse"-- Provided by publisher.

Includes bibliographical references and index.

Print version record.

English.

Added to collection customer.56279.3