Can anybody explain Apache Flume for me in a plain language? I'd appreciate an explanation with a practical example instead of abstract theoretical definitions, then I can understand better.
What is it used for? At which stage of a BigData analysis is it used?
And what are prerequisites for learning it?
As you would explain for a non-technical person
What is Apache Flume?
Applications of Flume
The applications of flume are,
Flume is used in e-commerce Company to analyze the customer behavior of different regions.
It is used to feed huge log data generated by application servers into HDF5 at a higher speed.
What are prerequisites for learning it?
For more information refer Apache Flume
What is it used for?
Data ingestion into a distributed datastore (e.g. HDFS). See image (I did not make the image and am only including the image for visual aid). There are other tools that will help you with the ingestion of data as well (Storm and Sqoop are mentioned).
At which stage of a BigData analysis is it used?
It is used for data ingestion into your distributed datastore (e.g. HDFS). So for example a webserver is running logging information into /var/logs/webserver.log. Apache Flume can look at that file, grab what it needs out of it and send it to HDFS. Once the data gets put into your datastore you can then utilize other tools to analyze the imported data (e.g. Hive, Pig, MR, etc.)