QBoard » Big Data » Big Data - Spark
  • Sai Anirudh
    I'm just wondering what is the difference between an RDD and DataFrame (Spark 2.0.0 DataFrame is a mere type alias for Dataset) in Apache Spark?Can you convert one to the other?
    Last post by Viaan Prakash - December 11, 2021
    455 views 0 likes
    3
  • Advika Banerjee
    I did come across a mini tutorial for data preprocessing using spark here: http://ampcamp.berkeley.edu/big-data-mini-course/featurization.html
    However, this discusses only about...  more
    Last post by Viaan Prakash - December 9, 2021
    203 views 0 likes
    3
  • Samar Patil
    I would like to read a CSV in spark and convert it as DataFrame and store it in HDFS with df.registerTempTable("table_name")I have tried:
    scala> val df =...  more
    Last post by Vaibhav Mali - December 8, 2021
    240 views 0 likes
    3
  • Maryam Bains
    I have a spark streaming application which produces a dataset for every minute. I need to save/overwrite the results of the processed data.
    When I tried to overwrite the dataset...  more
    Last post by Vaibhav Mali - December 8, 2021
    241 views 0 likes
    3
  • Maryam Bains
    I'd like to stop various messages that are coming on spark shell.I tried to edit the log4j.properties file in order to stop these message.Here are the contents of...  more
    Last post by Advika Banerjee - December 6, 2021
    221 views 0 likes
    3
  • Maryam Bains
    I'm trying to understand the relationship of the number of cores and the number of executors when running a Spark job on YARN.
    The test environment is as follows:

    Number of data...  more
    Last post by Advika Banerjee - December 6, 2021
    196 views 0 likes
    3
  • Advika Banerjee
    What are the differences between Apache Spark SQLContext and HiveContext ?
    Some sources say that since the HiveContext is a superset of SQLContext developers should always use...  more
    Last post by Viaan Prakash - November 29, 2021
    221 views 0 likes
    4
  • Samar Patil
    Quoting the Spark DataFrames, Datasets and SQL manual:

    A handful of Hive optimizations are not yet included in Spark. Some of these (such as indexes) are less important due to...  more
    Last post by Maryam Bains - November 17, 2021
    198 views 0 likes
    3
  • Viaan Prakash
    How can I increase the memory available for Apache spark executor nodes?
    I have a 2 GB file that is suitable to loading in to Apache Spark. I am running apache spark for the...  more
    Last post by Maryam Bains - November 17, 2021
    140 views 0 likes
    2
  • Maryam Bains
    I tried to start spark 1.6.0 (spark-1.6.0-bin-hadoop2.4) on Mac OS Yosemite 10.10.5 using
    "./bin/spark-shell"....  more
    Last post by Viaan Prakash - November 12, 2021
    310 views 0 likes
    2
  • Samar Patil
    I am trying to get live JSON data from RabbitMQ to Apache Spark using Java and do some realtime analytics out of it.
    I am able to get the data and also do some basic SQL queries...  more
    Last post by Viaan Prakash - November 12, 2021
    189 views 0 likes
    3
  • Samar Patil
    I'm new to big data processing and I'm reading about tools for stream processing and building data pipelines. I found Apache Spark and Spring Cloud Data Flow. I want to know the...  more
    Last post by Viaan Prakash - November 12, 2021
    353 views 0 likes
    3
  • Advika Banerjee
    I've got big RDD(1gb) in yarn cluster. On local machine, which use this cluster I have only 512 mb. I'd like to iterate over values in RDD on my local machine. I can't use...  more
    Last post by Maryam Bains - November 9, 2021
    162 views 0 likes
    3
  • Vaibhav Mali
    I am confused as to where Talend and Apache spark fit in the big data ecosystem as both Apache Spark and Talend can be used for ETL.
    Could someone please explain this with an example?
    Last post by Sindhuja Martha - October 23, 2021
    194 views 0 likes
    4
  • Sai Anirudh
    I read Cluster Mode Overview and I still can't understand the different processes in the Spark Standalone cluster and the parallelism.
    Is the worker a JVM process or not? I...  more
    Last post by Advika Banerjee - October 18, 2021
    1,192 views 0 likes
    3

QBoard Statistics

Topics 39
Posts 158
Total Users 7408
Active Users 17