I am a learner in Big data concepts. Based on my understanding Big Data is critical in handling unstructured data and high volume.When we look at the big data architecture for a datawarehouse (DW) the data from source is extracted through the Hadoop (HDFS and Mapreduce) and the relevant unstructured information is converted to a valid business information and finally data is injected to the DW or DataMart through ETL processing (along with the existing sturctured data processing).
However i would like to know what are the new techniques/new dimensional model or storage requirements required at DW for an organization (due to the Big Data) as most of the tutorials/resources i try to learn only talks about Hadoop at source but not at target. How does the introduction of Big Data impacts the predefined reports/adhoc analysis of an organization due to this high volume of data
Appreciate your response
Big data refers to volume, variety, and velocity of the data. How big is the data, the speed at which it is coming and a variety of data determines so-called “Big Data”. The 3 V’s of the big data was articulated by industry analyst Doug Laney in the early 2000s.
Both the above look similar but there is a clear difference. Big data is a repository to hold lots of data but it is not sure what we want to do with it, whereas data warehouse is designed with the clear intention to make informed decisions. Further, a big data can be used for data warehousing purposes.
Data warehouse is an architecture used to organize the data. |
Big Data: Big Data basically refers to the data which is in large volume and has complex data sets. This large amount of data can be structured, semi-structured, or non-structured and cannot be processed by traditional data processing software and databases. Various operations like analysis, manipulation, changes, etc are performed on data and then it is used by companies for intelligent decision making. Big data is a very powerful asset in today’s world. Big data can also be used to tackle business problems by providing intelligent decision making.
Data Warehouse: Data Warehouse is basically the collection of data from various heterogeneous sources. It is the main component of the business intelligence system where analysis and management of data are done which is further used to improve decision making. It involves the process of extraction, loading, and transformation for providing the data for analysis. Data warehouses are also used to perform queries on a large amount of data. It uses data from various relational databases and application log files.