QBoard » Big Data » Big Data - Spark » How to turn off INFO logging in Spark?

How to turn off INFO logging in Spark?

  • I installed Spark using the AWS EC2 guide and I can launch the program fine using the bin/pyspark script to get to the spark prompt and can also do the Quick Start quide successfully.

    However, I cannot for the life of me figure out how to stop all of the verbose INFO logging after each command.

    I have tried nearly every possible scenario in the below code (commenting out, setting to OFF) within my log4j.properties file in the conf folder in where I launch the application from as well as on each node and nothing is doing anything. I still get the logging INFO statements printing after executing each statement.

    I am very confused with how this is supposed to work.

    #Set everything to be logged to the console log4j.rootCategory=INFO, console                                                                        
    log4j.appender.console=org.apache.log4j.ConsoleAppender 
    log4j.appender.console.target=System.err     
    log4j.appender.console.layout=org.apache.log4j.PatternLayout 
    log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n
    
    # Settings to quiet third party logs that are too verbose
    log4j.logger.org.eclipse.jetty=WARN
    log4j.logger.org.apache.spark.repl.SparkIMain$exprTyper=INFO
    log4j.logger.org.apache.spark.repl.SparkILoop$SparkILoopInterpreter=INFO​


    Here is my full classpath when I use SPARK_PRINT_LAUNCH_COMMAND:

    Spark Command: /Library/Java/JavaVirtualMachines/jdk1.8.0_05.jdk/Contents/Home/bin/java -cp :/root/spark-1.0.1-bin-hadoop2/conf:/root/spark-1.0.1-bin-hadoop2/conf:/root/spark-1.0.1-bin-hadoop2/lib/spark-assembly-1.0.1-hadoop2.2.0.jar:/root/spark-1.0.1-bin-hadoop2/lib/datanucleus-api-jdo-3.2.1.jar:/root/spark-1.0.1-bin-hadoop2/lib/datanucleus-core-3.2.2.jar:/root/spark-1.0.1-bin-hadoop2/lib/datanucleus-rdbms-3.2.1.jar -XX:MaxPermSize=128m -Djava.library.path= -Xms512m -Xmx512m org.apache.spark.deploy.SparkSubmit spark-shell --class org.apache.spark.repl.Main

    contents of spark-env.sh:

    #!/usr/bin/env bash
    
    # This file is sourced when running various Spark programs.
    # Copy it as spark-env.sh and edit that to configure Spark for your site.
    
    # Options read when launching programs locally with 
    # ./bin/run-example or ./bin/spark-submit
    # - HADOOP_CONF_DIR, to point Spark towards Hadoop configuration files
    # - SPARK_LOCAL_IP, to set the IP address Spark binds to on this node
    # - SPARK_PUBLIC_DNS, to set the public dns name of the driver program
    # - SPARK_CLASSPATH=/root/spark-1.0.1-bin-hadoop2/conf/
    
    # Options read by executors and drivers running inside the cluster
    # - SPARK_LOCAL_IP, to set the IP address Spark binds to on this node
    # - SPARK_PUBLIC_DNS, to set the public DNS name of the driver program
    # - SPARK_CLASSPATH, default classpath entries to append
    # - SPARK_LOCAL_DIRS, storage directories to use on this node for shuffle and RDD data
    # - MESOS_NATIVE_LIBRARY, to point to your libmesos.so if you use Mesos
    
    # Options read in YARN client mode
    # - HADOOP_CONF_DIR, to point Spark towards Hadoop configuration files
    # - SPARK_EXECUTOR_INSTANCES, Number of workers to start (Default: 2)
    # - SPARK_EXECUTOR_CORES, Number of cores for the workers (Default: 1).
    # - SPARK_EXECUTOR_MEMORY, Memory per Worker (e.g. 1000M, 2G) (Default: 1G)
    # - SPARK_DRIVER_MEMORY, Memory for Master (e.g. 1000M, 2G) (Default: 512 Mb)
    # - SPARK_YARN_APP_NAME, The name of your application (Default: Spark)
    # - SPARK_YARN_QUEUE, The hadoop queue to use for allocation requests (Default: ‘default’)
    # - SPARK_YARN_DIST_FILES, Comma separated list of files to be distributed with the job.
    # - SPARK_YARN_DIST_ARCHIVES, Comma separated list of archives to be distributed with the job.
    
    # Options for the daemons used in the standalone deploy mode:
    # - SPARK_MASTER_IP, to bind the master to a different IP address or hostname
    # - SPARK_MASTER_PORT / SPARK_MASTER_WEBUI_PORT, to use non-default ports for the master
    # - SPARK_MASTER_OPTS, to set config properties only for the master (e.g. "-Dx=y")
    # - SPARK_WORKER_CORES, to set the number of cores to use on this machine
    # - SPARK_WORKER_MEMORY, to set how much total memory workers have to give executors (e.g. 1000m, 2g)
    # - SPARK_WORKER_PORT / SPARK_WORKER_WEBUI_PORT, to use non-default ports for the worker
    # - SPARK_WORKER_INSTANCES, to set the number of worker processes per node
    # - SPARK_WORKER_DIR, to set the working directory of worker processes
    # - SPARK_WORKER_OPTS, to set config properties only for the worker (e.g. "-Dx=y")
    # - SPARK_HISTORY_OPTS, to set config properties only for the history server (e.g. "-Dx=y")
    # - SPARK_DAEMON_JAVA_OPTS, to set config properties for all daemons (e.g. "-Dx=y")
    # - SPARK_PUBLIC_DNS, to set the public dns name of the master or workers
    
    export SPARK_SUBMIT_CLASSPATH="$FWDIR/conf"
      September 18, 2021 12:32 PM IST
    0
  • I used this with Amazon EC2 with 1 master and 2 slaves and Spark 1.2.1.

    # Step 1. Change config file on the master node
    nano /root/ephemeral-hdfs/conf/log4j.properties
    
    # Before
    hadoop.root.logger=INFO,console
    # After
    hadoop.root.logger=WARN,console
    
    # Step 2. Replicate this change to slaves
    ~/spark-ec2/copy-dir /root/ephemeral-hdfs/conf/
      October 9, 2021 1:11 PM IST
    0
  • If you don’t want to see any logs messages at all, just start the Spark shell and write these commands :

    import org.apache.log4j.Logger
    
    import org.apache.log4j.Level
    
    Logger.getLogger("org").setLevel(Level.OFF) Logger.getLogger("akka").setLevel(Level.OFF)

     
    If you don’t want to see logging INFO messages, go to your log4j.properties file in the conf folder and make changes in this one single line :

    log4j.rootCategory=INFO, console
    
    To
    
    log4j.rootCategory=ERROR, console
      September 23, 2021 1:40 PM IST
    0
  • You can use setLogLevel

    val spark = SparkSession
          .builder()
          .config("spark.master", "local[1]")
          .appName("TestLog")
          .getOrCreate()
    
    spark.sparkContext.setLogLevel("WARN")
      September 30, 2021 12:35 PM IST
    0