Hadoop and Apache Spark are both big-data frameworks, but they don't really serve the same purposes.
Scooped by Luca Naso |
Get Started for FREE
Sign up with Facebook Sign up with X
I don't have a Facebook or a X account
Your new post is loading...
Your new post is loading...
Carsfinance's comment,
November 27, 2023 12:20 AM
good
magicmushroomsdispensary's comment,
March 20, 2:52 AM
nice
Sip and Paint DC's comment,
April 3, 3:32 AM
good
Sign up to comment
|
|
In my opinion, Spark should NOT be compared with Hadoop but with MapReduce. However, people usually compare Hadoop and Spark (probably because they are buzzwords).
5 things to keep in mind:
1. They do different things -
Hadoop is a distributed data infrastructure (HDFS),
Spark is a data-processing tool.
2. Hadoop is more complete -
Hadoop also includes a data-processing tool (MapReduce),
Spark does not have its own filesystem and needs to be integrated with some.
3. Spark is (much) faster -
MapReduce operates in step;
Spark operates in one shot (because it is in-memory).
4. Speed is not always what you need -
For batch processing you do not need Spark's high velocity;
Common applications for Spark are those requiring real-time analysis.
5. Failure recovery -
both Hadoop and Spark are resilient to failures.