QBoard » Big Data » Big Data - Data Processing and ETL » Confusion between Operational and Analytical Big Data and on which category Hadoop operates?

Confusion between Operational and Analytical Big Data and on which category Hadoop operates?

  • I can't wrap my head around the basic theoretical concept of 'Operational and Analytical Big Data'.

    According to me:

    1. Operational Big Data: Branch where we can perform Read/write operations on big data using specially designed Databases (NoSQL). Somewhat similar to ETL in RDMS.

    2. Analytical Big Data: Branch where we analyse data in retrospect and draw predictions using techniques like MPP and MapReduce. Somewhat similar to reporting in RDMS.

    (Please feel free to correct wherever I'm wrong, it's just my understanding.)

    So according to me, Hadoop is used for Analytical Big Data where we just process data for analysis but don't temper original data and hence is not an idea choice for ETL. But recently I have come across this article which advocates using Hadoop for ETL: https://www.datanami.com/2014/09/01/five-steps-to-running-etl-on-hadoop-for-web-companies/

      July 23, 2021 1:31 PM IST
    0
  • Hadoop (MapReduce) is not an efficient processing layer, IMO, without adequate tweaking, so out of the box, the answer is neither. Sure, MapReduce could be used, and under the hood, that API is what most higher level tools depend on, but since those other tools exist, you wouldn't want to go write ETL jobs in plain MapReduce.

    You can combine Hadoop with Spark, Presto, HBase, Hive, etc. to unlock these other Operational or Analytical layers, some are useful for reporting use cases, and others are useful for ETL. Again, plenty of knobs to get useful results in a reasonable time compared to an RDBMS (or other NoSQL tools). Plus, it takes several attempts to know how to best store data in Hadoop to begin with (hint: not plaintext, and not lots of small files)

    That link is over 5 years old now, and references Flume and Sqoop. Other "web scale" technologies have shown their worth in that time, meanwhile Flume and Sqoop have shown their age can be difficult to configure manage compared to tools like Apache NiFi.

    This post was edited by Samar Patil at August 13, 2021 12:52 PM IST
      August 13, 2021 12:52 PM IST
    0
  • These new technologies that have arisen in response to Big Data handle data creation and storage, retrieving and analyzing data. When you’re evaluating the different technologies to use, you typically encounter operational vs. analytical Big Data solutions. Operational Big Data systems provide operational features to run real-time, interactive workloads that ingest and store data. MongoDB is a top technology for operational Big Data applications with over 10 million downloads of its open source software.

    Analytical Big Data technologies, on the other hand, are useful for retrospective, sophisticated analytics of your data. Hadoop is the most popular example of an Analytical Big Data technology.

    But picking an operational vs analytical Big Data solution isn’t the right way to think about the challenge. They are complementary technologies and you likely need both to develop a complete Big Data solution.

    MongoDB works well with Hadoop thanks to an API integration that makes it easy to integrate the two solutions. Many of our customers, such as the City of Chicago, have built amazing applications never before possible as a result of combining operational and analytical technologies.

      November 20, 2021 12:31 PM IST
    0