Simple Explanation of Apache Flume

  • Can anybody explain Apache Flume for me in a plain language? I'd appreciate an explanation with a practical example instead of abstract theoretical definitions, then I can understand better.

    What is it used for? At which stage of a BigData analysis is it used?

    And what are prerequisites for learning it?

    Please

    As you would explain for a non-technical person

      June 12, 2019 12:11 PM IST
    0
  • What is Apache Flume?

    • Apache Flume is a tool for designed for streaming data ingestion in HDFS. Objective: The main objective of Flume is to capture streaming data from various web servers to HDFS.

    Applications of Flume

    • The applications of flume are,

      • Flume is used in e-commerce Company to analyze the customer behavior of different regions.

      • It is used to feed huge log data generated by application servers into HDF5 at a higher speed.

    What are prerequisites for learning it?

    • Basics of Hadoop, Big-data is must.
    • Basics of Linux and scripting
    • Main thing is interest towards technology.

    For more information refer Apache Flume

      June 12, 2019 12:13 PM IST
    0
  • What is it used for?

    Data ingestion into a distributed datastore (e.g. HDFS). See image (I did not make the image and am only including the image for visual aid). There are other tools that will help you with the ingestion of data as well (Storm and Sqoop are mentioned).

    At which stage of a BigData analysis is it used?

    It is used for data ingestion into your distributed datastore (e.g. HDFS). So for example a webserver is running logging information into /var/logs/webserver.log. Apache Flume can look at that file, grab what it needs out of it and send it to HDFS. Once the data gets put into your datastore you can then utilize other tools to analyze the imported data (e.g. Hive, Pig, MR, etc.)

      June 14, 2019 12:08 PM IST
    0