Amazon cover image
Image from Amazon.com

Data warehousing in the age of big data / Krish Krishnan.

By: Material type: TextTextSeries: Morgan Kaufmann Series on Business IntelligencePublisher: Amsterdam : Morgan Kaufmann is an imprint of Elsevier, 2013Description: 1 online resourceContent type:
  • text
Media type:
  • computer
Carrier type:
  • online resource
ISBN:
  • 0124059201
  • 9780124059207
  • 1299591914
  • 9781299591912
Subject(s): Additional physical formats: Print version:: No titleDDC classification:
  • 005.74/5 23
LOC classification:
  • QA76.9.D37 K75 2013eb
Online resources:
Contents:
Front Cover -- Data Warehousing in the Age of Big Data -- Copyright Page -- Contents -- Acknowledgments -- About the Author -- Introduction -- Part 1: Big Data -- Part 2: The Data Warehousing -- Part 3: Building the Big Data -- Data Warehouse -- Appendixes -- Companion website -- 1 BIG DATA -- 1 Introduction to Big Data -- Introduction -- Big Data -- Defining Big Data -- Why Big Data and why now? -- Big Data example -- Social Media posts -- Survey data analysis -- Survey data -- Weather data -- Twitter data -- Integration and analysis -- Additional data types -- Summary -- Further reading.
2 Working with Big Data -- Introduction -- Data explosion -- Data volume -- Machine data -- Application log -- Clickstream logs -- External or third-party data -- Emails -- Contracts -- Geographic information systems and geo-spatial data -- Example: Funshots, Inc. -- Data velocity -- Amazon, Facebook, Yahoo, and Google -- Sensor data -- Mobile networks -- Social media -- Data variety -- Summary -- 3 Big Data Processing Architectures -- Introduction -- Data processing revisited -- Data processing techniques -- Data processing infrastructure challenges -- Storage -- Transportation -- Processing.
Journal -- Checkpoint -- HDFS startup -- Block allocation and storage in HDFS -- HDFS client -- Replication and recovery -- Communication and management -- Heartbeats -- CheckpointNode and BackupNode -- CheckpointNode -- BackupNode -- File system snapshots -- JobTracker and TaskTracker -- MapReduce -- MapReduce programming model -- MapReduce program design -- MapReduce implementation architecture -- MapReduce job processing and management -- MapReduce limitations (Version 1, Hadoop MapReduce) -- MapReduce v2 (YARN) -- YARN scalability -- Comparison between MapReduce v1 and v2 -- SQL/MapReduce.
Speed or throughput -- Shared-everything and shared-nothing architectures -- Shared-everything architecture -- Shared-nothing architecture -- OLTP versus data warehousing -- Big Data processing -- Infrastructure explained -- Data processing explained -- Telco Big Data study -- Infrastructure -- Data processing -- 4 Introducing Big Data Technologies -- Introduction -- Distributed data processing -- Big Data processing requirements -- Technologies for Big Data processing -- Google file system -- Hadoop -- Hadoop core components -- HDFS -- HDFS architecture -- NameNode -- DataNodes -- Image.
Zookeeper -- Zookeeper features -- Locks and processing -- Failure and recovery -- Pig -- Programming with pig latin -- Pig data types -- Running pig programs -- Pig program flow -- Common pig command -- HBase -- HBase architecture -- HBase components -- Write-ahead log -- Hive -- Hive architecture -- Infrastructure -- Execution: how does hive process queries? -- Hive data types -- Hive query language (HiveQL) -- Chukwa -- Flume -- Oozie -- HCatalog -- Sqoop -- Sqoop1 -- Sqoop2 -- Hadoop summary -- NoSQL -- CAP theorem -- Key-value pair: Voldemort -- Column family store: Cassandra -- Data model.
Summary: "In conclusion as you come to the end of this book, the concept of a Data Warehouse and its primary goal of serving the enterprise version of truth, and being the single platform for all the source of information will continue to remain intact and valid for many years to come. As we have discussed across many chapters and in many case studies, the limitations that existed with the infrastructures to create, manage and deploy Data Warehouses have been largely eliminated with the availability of Big Data technologies and infrastructure platforms, making the goal of the single version of truth a feasible reality. Integrating and extending Big Data into the Data Warehouse, and creating a larger decision support platform will benefit businesses for years to come. This book has touched upon governance and information lifecycle management aspects of Big Data in the larger program, however you can reuse all the current program management techniques that you follow for the Data Warehouse for this program and even implement agile approaches to integrating and managing data in the Data Warehouse. Technologies will continue to evolve in this spectrum and there will be more additions of solutions, which can be integrated if you follow the modular integration approaches to building and managing the Data Warehouse. The Appendix sections contain many more case studies and a special section on Healthcare Information Factory based on Big Data approaches. These are more guiding posts to help you align your thoughts and goals to building and integrating Big Data in your Data Warehouse"-- Provided by publisher.
Holdings
Item type Current library Collection Call number Status Date due Barcode Item holds
eBook eBook e-Library EBSCO Computers Available
Total holds: 0

"In conclusion as you come to the end of this book, the concept of a Data Warehouse and its primary goal of serving the enterprise version of truth, and being the single platform for all the source of information will continue to remain intact and valid for many years to come. As we have discussed across many chapters and in many case studies, the limitations that existed with the infrastructures to create, manage and deploy Data Warehouses have been largely eliminated with the availability of Big Data technologies and infrastructure platforms, making the goal of the single version of truth a feasible reality. Integrating and extending Big Data into the Data Warehouse, and creating a larger decision support platform will benefit businesses for years to come. This book has touched upon governance and information lifecycle management aspects of Big Data in the larger program, however you can reuse all the current program management techniques that you follow for the Data Warehouse for this program and even implement agile approaches to integrating and managing data in the Data Warehouse. Technologies will continue to evolve in this spectrum and there will be more additions of solutions, which can be integrated if you follow the modular integration approaches to building and managing the Data Warehouse. The Appendix sections contain many more case studies and a special section on Healthcare Information Factory based on Big Data approaches. These are more guiding posts to help you align your thoughts and goals to building and integrating Big Data in your Data Warehouse"-- Provided by publisher.

Includes bibliographical references and index.

Print version record.

Front Cover -- Data Warehousing in the Age of Big Data -- Copyright Page -- Contents -- Acknowledgments -- About the Author -- Introduction -- Part 1: Big Data -- Part 2: The Data Warehousing -- Part 3: Building the Big Data -- Data Warehouse -- Appendixes -- Companion website -- 1 BIG DATA -- 1 Introduction to Big Data -- Introduction -- Big Data -- Defining Big Data -- Why Big Data and why now? -- Big Data example -- Social Media posts -- Survey data analysis -- Survey data -- Weather data -- Twitter data -- Integration and analysis -- Additional data types -- Summary -- Further reading.

2 Working with Big Data -- Introduction -- Data explosion -- Data volume -- Machine data -- Application log -- Clickstream logs -- External or third-party data -- Emails -- Contracts -- Geographic information systems and geo-spatial data -- Example: Funshots, Inc. -- Data velocity -- Amazon, Facebook, Yahoo, and Google -- Sensor data -- Mobile networks -- Social media -- Data variety -- Summary -- 3 Big Data Processing Architectures -- Introduction -- Data processing revisited -- Data processing techniques -- Data processing infrastructure challenges -- Storage -- Transportation -- Processing.

Journal -- Checkpoint -- HDFS startup -- Block allocation and storage in HDFS -- HDFS client -- Replication and recovery -- Communication and management -- Heartbeats -- CheckpointNode and BackupNode -- CheckpointNode -- BackupNode -- File system snapshots -- JobTracker and TaskTracker -- MapReduce -- MapReduce programming model -- MapReduce program design -- MapReduce implementation architecture -- MapReduce job processing and management -- MapReduce limitations (Version 1, Hadoop MapReduce) -- MapReduce v2 (YARN) -- YARN scalability -- Comparison between MapReduce v1 and v2 -- SQL/MapReduce.

Speed or throughput -- Shared-everything and shared-nothing architectures -- Shared-everything architecture -- Shared-nothing architecture -- OLTP versus data warehousing -- Big Data processing -- Infrastructure explained -- Data processing explained -- Telco Big Data study -- Infrastructure -- Data processing -- 4 Introducing Big Data Technologies -- Introduction -- Distributed data processing -- Big Data processing requirements -- Technologies for Big Data processing -- Google file system -- Hadoop -- Hadoop core components -- HDFS -- HDFS architecture -- NameNode -- DataNodes -- Image.

Zookeeper -- Zookeeper features -- Locks and processing -- Failure and recovery -- Pig -- Programming with pig latin -- Pig data types -- Running pig programs -- Pig program flow -- Common pig command -- HBase -- HBase architecture -- HBase components -- Write-ahead log -- Hive -- Hive architecture -- Infrastructure -- Execution: how does hive process queries? -- Hive data types -- Hive query language (HiveQL) -- Chukwa -- Flume -- Oozie -- HCatalog -- Sqoop -- Sqoop1 -- Sqoop2 -- Hadoop summary -- NoSQL -- CAP theorem -- Key-value pair: Voldemort -- Column family store: Cassandra -- Data model.

Copyright: Elsevier Science & Technology 2013

English.

Added to collection customer.56279.3

Powered by Koha