QBoard » Big Data » Big Data - Spark
  • Sai Anirudh
    I'm just wondering what is the difference between an RDD and DataFrame (Spark 2.0.0 DataFrame is a mere type alias for Dataset) in Apache Spark?Can you convert one to the other?
    Last post by Viaan Prakash - December 11, 2021
    450 views 0 likes
    3
  • Advika Banerjee
    I did come across a mini tutorial for data preprocessing using spark here: http://ampcamp.berkeley.edu/big-data-mini-course/featurization.html
    However, this discusses only about...  more
    Last post by Viaan Prakash - December 9, 2021
    199 views 0 likes
    3
  • Samar Patil
    I would like to read a CSV in spark and convert it as DataFrame and store it in HDFS with df.registerTempTable("table_name")I have tried:
    scala> val df =...  more
    Last post by Vaibhav Mali - December 8, 2021
    232 views 0 likes
    3
  • Maryam Bains
    I have a spark streaming application which produces a dataset for every minute. I need to save/overwrite the results of the processed data.
    When I tried to overwrite the dataset...  more
    Last post by Vaibhav Mali - December 8, 2021
    237 views 0 likes
    3
  • Maryam Bains
    I'd like to stop various messages that are coming on spark shell.I tried to edit the log4j.properties file in order to stop these message.Here are the contents of...  more
    Last post by Advika Banerjee - December 6, 2021
    216 views 0 likes
    3
  • Maryam Bains
    I'm trying to understand the relationship of the number of cores and the number of executors when running a Spark job on YARN.
    The test environment is as follows:

    Number of data...  more
    Last post by Advika Banerjee - December 6, 2021
    195 views 0 likes
    3
  • Advika Banerjee
    What are the differences between Apache Spark SQLContext and HiveContext ?
    Some sources say that since the HiveContext is a superset of SQLContext developers should always use...  more
    Last post by Viaan Prakash - November 29, 2021
    215 views 0 likes
    4
  • Samar Patil
    Quoting the Spark DataFrames, Datasets and SQL manual:

    A handful of Hive optimizations are not yet included in Spark. Some of these (such as indexes) are less important due to...  more
    Last post by Maryam Bains - November 17, 2021
    192 views 0 likes
    3
  • Viaan Prakash
    How can I increase the memory available for Apache spark executor nodes?
    I have a 2 GB file that is suitable to loading in to Apache Spark. I am running apache spark for the...  more
    Last post by Maryam Bains - November 17, 2021
    139 views 0 likes
    2
  • Maryam Bains
    I tried to start spark 1.6.0 (spark-1.6.0-bin-hadoop2.4) on Mac OS Yosemite 10.10.5 using
    "./bin/spark-shell"....  more
    Last post by Viaan Prakash - November 12, 2021
    303 views 0 likes
    2
  • Samar Patil
    I am trying to get live JSON data from RabbitMQ to Apache Spark using Java and do some realtime analytics out of it.
    I am able to get the data and also do some basic SQL queries...  more
    Last post by Viaan Prakash - November 12, 2021
    188 views 0 likes
    3
  • Samar Patil
    I'm new to big data processing and I'm reading about tools for stream processing and building data pipelines. I found Apache Spark and Spring Cloud Data Flow. I want to know the...  more
    Last post by Viaan Prakash - November 12, 2021
    349 views 0 likes
    3
  • Advika Banerjee
    I've got big RDD(1gb) in yarn cluster. On local machine, which use this cluster I have only 512 mb. I'd like to iterate over values in RDD on my local machine. I can't use...  more
    Last post by Maryam Bains - November 9, 2021
    161 views 0 likes
    3
  • Vaibhav Mali
    I am confused as to where Talend and Apache spark fit in the big data ecosystem as both Apache Spark and Talend can be used for ETL.
    Could someone please explain this with an example?
    Last post by Sindhuja Martha - October 23, 2021
    190 views 0 likes
    4
  • Sai Anirudh
    I read Cluster Mode Overview and I still can't understand the different processes in the Spark Standalone cluster and the parallelism.
    Is the worker a JVM process or not? I...  more
    Last post by Advika Banerjee - October 18, 2021
    1,182 views 0 likes
    3

QBoard Statistics

Topics 39
Posts 158
Total Users 7406
Active Users 17