QBoard » Big Data » Big Data - Hadoop Eco-System » Can apache spark run without hadoop?

Can apache spark run without hadoop?


  • Are there any dependencies between Spark and Hadoop?

    If not, are there any features I'll miss when I run Spark without Hadoop?
      September 30, 2020 2:21 PM IST
    0
  • As per Spark documentation, Spark can run without Hadoop.

    You may run it as a Standalone mode without any resource manager.

    But if you want to run in multi-node setup, you need a resource manager like YARN or Mesos and a distributed file system like HDFS,S3 etc.

      September 30, 2020 4:45 PM IST
    0
  • Spark is an in-memory distributed computing engine.
    Hadoop is a framework for distributed storage (HDFS) and distributed processing (YARN).
    Spark can run with or without Hadoop components (HDFS/YARN)

    Distributed Storage:
    Since Spark does not have its own distributed storage system, it has to depend on one of these storage systems for distributed computing.
    S3 – Non-urgent batch jobs. S3 fits very specific use cases when data locality isn’t critical.
    Cassandra – Perfect for streaming data analysis and an overkill for batch jobs.
    HDFS – Great fit for batch jobs without compromising on data locality.

    Distributed processing:
    You can run Spark in three different modes: Standalone, YARN and Mesos
    Have a look at the below SE question for a detailed explanation about both distributed storage and distributed processing.
      September 30, 2020 4:48 PM IST
    0
  • By default , Spark does not have storage mechanism.

    To store data, it needs fast and scalable file system. You can use S3 or HDFS or any other file system. Hadoop is economical option due to low cost.

    Additionally if you use Tachyon, it will boost performance with Hadoop. It's highly recommended Hadoop for apache spark processing. 

      September 30, 2020 4:49 PM IST
    0
  • Yes, spark can run without hadoop. All core spark features will continue to work, but you'll miss things like easily distributing all your files (code as well as data) to all the nodes in the cluster via hdfs, etc.
      September 30, 2020 4:50 PM IST
    0
  • Spark can run without Hadoop but some of its functionality relies on Hadoop's code (e.g. handling of Parquet files). We're running Spark on Mesos and S3 which was a little tricky to set up but works really well once done (you can read a summary of what needed to properly set it here).
      September 30, 2020 4:51 PM IST
    0
  • Yes spark can run without Hadoop. You can install spark in your local machine with out Hadoop. But Spark lib comes with pre Haddop libraries i.e. are used while installing on your local machine.

     
      December 23, 2021 1:37 PM IST
    0
  • You can run Apache Spark without Hadoop. If you use Spark on Cloudera by default it works with Hadoop. But if you wanted to run some Spark examples on your system you can run without Hadoop.

    I have recently used the below article to install and am able to run Spark without hadoop.
    https://sparkbyexamples.com/spark/apache-spark-installation-on-windows/

    Hope this helps.
      February 17, 2022 9:28 AM IST
    0
  • You can run spark without hadoop but spark has dependency on hadoop win-utils. so some features may not work, also if you want to read hive tables from spark then you need hadoop.

     
      December 31, 2021 12:27 PM IST
    0