Big Data - Spark - Forum View

QBoard » Big Data » Big Data - Spark

Most Liked Topics

Best way to delete mil...
757 views 3 likes

4
How to choose cross-en...
764 views 1 like

4
Does changing the orde...
1,505 views 1 like

4
Any 'pretty' data visu...
1,690 views 1 like

6
Can one get hierarchic...
4,254 views 1 like

4

Search QBoard

Difference between DataFrame, Dataset, and RDD in Spark

I'm just wondering what is the difference between an RDD and DataFrame (Spark 2.0.0 DataFrame is a mere type alias for Dataset) in Apache Spark?Can you convert one to the other?

Last post by Viaan Prakash - December 11, 2021
450 views 0 likes

3
How to read XML files from apache spark framework?

I did come across a mini tutorial for data preprocessing using spark here: http://ampcamp.berkeley.edu/big-data-mini-course/featurization.html
However, this discusses only about... more

Last post by Viaan Prakash - December 9, 2021
199 views 0 likes

3
Spark - load CSV file as DataFrame?

I would like to read a CSV in spark and convert it as DataFrame and store it in HDFS with df.registerTempTable("table_name")I have tried:
scala> val df =... more

Last post by Vaibhav Mali - December 8, 2021
232 views 0 likes

3
How to overwrite the output directory in spark

I have a spark streaming application which produces a dataset for every minute. I need to save/overwrite the results of the processed data.
When I tried to overwrite the dataset... more

Last post by Vaibhav Mali - December 8, 2021
237 views 0 likes

3
How to stop INFO messages displaying on spark console?

I'd like to stop various messages that are coming on spark shell.I tried to edit the log4j.properties file in order to stop these message.Here are the contents of... more

Last post by Advika Banerjee - December 6, 2021
216 views 0 likes

3
Apache Spark: The number of cores vs. the number of executors

I'm trying to understand the relationship of the number of cores and the number of executors when running a Spark job on YARN.
The test environment is as follows:

Number of data... more

Last post by Advika Banerjee - December 6, 2021
195 views 0 likes

3
What is the difference between Apache Spark SQLContext vs HiveContext?

What are the differences between Apache Spark SQLContext and HiveContext ?
Some sources say that since the HiveContext is a superset of SQLContext developers should always use... more

Last post by Viaan Prakash - November 29, 2021
215 views 0 likes

4
Why Spark SQL considers the support of indexes unimportant?

Quoting the Spark DataFrames, Datasets and SQL manual:

A handful of Hive optimizations are not yet included in Spark. Some of these (such as indexes) are less important due to... more

Last post by Maryam Bains - November 17, 2021
192 views 0 likes

3
How to set Apache Spark Executor memory

How can I increase the memory available for Apache spark executor nodes?
I have a 2 GB file that is suitable to loading in to Apache Spark. I am running apache spark for the... more

Last post by Maryam Bains - November 17, 2021
139 views 0 likes

2
Mac spark-shell Error initializing SparkContext

I tried to start spark 1.6.0 (spark-1.6.0-bin-hadoop2.4) on Mac OS Yosemite 10.10.5 using
"./bin/spark-shell".... more

Last post by Viaan Prakash - November 12, 2021
303 views 0 likes

2
Apache Spark -- Java , Group Live Stream data

I am trying to get live JSON data from RabbitMQ to Apache Spark using Java and do some realtime analytics out of it.
I am able to get the data and also do some basic SQL queries... more

Last post by Viaan Prakash - November 12, 2021
188 views 0 likes

3
Apache Spark vs Spring Cloud data flow

I'm new to big data processing and I'm reading about tools for stream processing and building data pipelines. I found Apache Spark and Spring Cloud Data Flow. I want to know the... more

Last post by Viaan Prakash - November 12, 2021
349 views 0 likes

3
Spark: Best practice for retrieving big data from RDD to local machine

I've got big RDD(1gb) in yarn cluster. On local machine, which use this cluster I have only 512 mb. I'd like to iterate over values in RDD on my local machine. I can't use... more

Last post by Maryam Bains - November 9, 2021
161 views 0 likes

3
Talend and Apache Spark?

I am confused as to where Talend and Apache spark fit in the big data ecosystem as both Apache Spark and Talend can be used for ETL.
Could someone please explain this with an example?

Last post by Sindhuja Martha - October 23, 2021
190 views 0 likes

4
What are workers, executors, cores in Spark Standalone cluster?

I read Cluster Mode Overview and I still can't understand the different processes in the Spark Standalone cluster and the parallelism.
Is the worker a JVM process or not? I... more

Last post by Advika Banerjee - October 18, 2021
1,182 views 0 likes

3

QBoard Statistics

Topics 39

Posts 158

Total Users 7406

Active Users 17

Most Liked Topics

Most Viewed Topics

Search QBoard

Difference between DataFrame, Dataset, and RDD in Spark

How to read XML files from apache spark framework?

Spark - load CSV file as DataFrame?

How to overwrite the output directory in spark

How to stop INFO messages displaying on spark console?

Apache Spark: The number of cores vs. the number of executors

What is the difference between Apache Spark SQLContext vs HiveContext?

Why Spark SQL considers the support of indexes unimportant?

How to set Apache Spark Executor memory

Mac spark-shell Error initializing SparkContext

Apache Spark -- Java , Group Live Stream data

Apache Spark vs Spring Cloud data flow

Spark: Best practice for retrieving big data from RDD to local machine

Talend and Apache Spark?

What are workers, executors, cores in Spark Standalone cluster?

QBoard Statistics

Connect With Us

Member Sign In

Member Sign In

Create Account

Most Liked Topics

Most Viewed Topics

Search QBoard

QBoard Statistics

Connect With Us