What is FLUME?.
Flume is a tool with data ingestion mechanism for collecting aggregating and transporting large amounts of streaming data such as log files, events from various sources to a centralized data store. Flume is a highly reliable, distributed, and configurable tool. It is principally designed to copy streaming data (log data) from various web servers to HDFS.
Pros:
- Provides the feature of contextual routing.
- Reliable, fault tolerant, scalable, manageable, and customizable.
- Supports multi-hop flows, fan-in fan-out flows, contextual routing, etc.
- Flume can be scaled horizontally.
Cons:
- Weak ordering guarantee.
- Does Not guarantee that message reaching is unique (duplicate messages might pop in at times, in many scenarios).