Getting strange behavior when calling function outside of a closure:
when function is in a object everything is working
when function is in a class get... moreGetting strange behavior when calling function outside of a closure:
when function is in a object everything is working
when function is in a class get :
Task not serializable: java.io.NotSerializableException: testing
The problem is I need my code in a class and not an object. Any idea why this is happening? Is a Scala object serialized (default?)?
This is a working code example:
object working extends App {
val list = List(1,2,3)
val rddList = Spark.ctx.parallelize(list)
//calling function outside closure
val after = rddList.map(someFunc(_))
def someFunc(a:Int) = a+1
after.collect().map(println(_))
}
This is the non-working example :
object NOTworking extends App {
new testing().doIT
}
//adding extends Serializable wont help
class testing {
val list = List(1,2,3)
val rddList = Spark.ctx.parallelize(list)
def doIT = {
//again calling the fucntion someFunc
val after = rddList.map(someFunc(_))
//this will crash (spark lazy)
... less
I read Cluster Mode Overview and I still can't understand the different processes in the Spark Standalone cluster and the parallelism.
Is the worker a JVM process or not? I... moreI read Cluster Mode Overview and I still can't understand the different processes in the Spark Standalone cluster and the parallelism.
Is the worker a JVM process or not? I ran the bin\start-slave.sh and found that it spawned the worker, which is actually a JVM.
As per the above link, an executor is a process launched for an application on a worker node that runs tasks. An executor is also a JVM.
These are my questions:
Executors are per application. Then what is the role of a worker? Does it co-ordinate with the executor and communicate the result back to the driver? Or does the driver directly talks to the executor? If so, what is the worker's purpose then?
How to control the number of executors for an application?
Can the tasks be made to run in parallel inside the executor? If so, how to configure the number of threads for an executor?
What is the relation between a worker, executors and executor cores ( --total-executor-cores)?
What does it mean to have more workers per... less
I'm just wondering what is the difference between an RDD and DataFrame (Spark 2.0.0 DataFrame is a mere type alias for Dataset) in Apache Spark?Can you convert one to the other?