Big Data - Spark - Forum View

QBoard » Big Data » Big Data - Spark

Most Liked Topics

Best way to delete mil...
768 views 3 likes

4
How to choose cross-en...
778 views 1 like

4
Does changing the orde...
1,522 views 1 like

4
Any 'pretty' data visu...
1,705 views 1 like

6
Can one get hierarchic...
4,263 views 1 like

4

Search QBoard

How are stages split into tasks in Spark?

Let's assume for the following that only one Spark job is running at every point in time.
What I get so far
Here is what I understand what happens in... more

Last post by Advika Banerjee - February 11, 2022
572 views 0 likes

3
DataFrame equality in Apache Spark

Assume df1 and df2 are two DataFrames in Apache Spark, computed using two different mechanisms, e.g., Spark SQL vs. the Scala/Java/Python API.Is there an idiomatic way to... more

Last post by Vaibhav Mali - February 2, 2022
224 views 0 likes

3
Spark performance for Scala vs Python

I prefer Python over Scala. But, as Spark is natively written in Scala, I was expecting my code to run faster in the Scala than the Python version for obvious reasons.With that... more

Last post by Advika Banerjee - January 17, 2022
174 views 0 likes

3
Spark : how to run spark file from spark shell

I am using CDH 5.2. I am able to use spark-shell to run the commands.How can I run the file(file.spark) which contain spark commands.Is there any way to run/compile the scala... more

Last post by Advika Banerjee - January 17, 2022
1,166 views 0 likes

4
How to perform union on two DataFrames with different amounts of columns in spark?

I have 2 DataFrames:
I need union like this:

The unionAll function doesn't work because the number and the name of columns are different.How can I do this?

Last post by Advika Banerjee - January 17, 2022
234 views 0 likes

5
How to show full column content in a Spark Dataframe?

I am using spark-csv to load data into a DataFrame. I want to do a simple query and display the content:
val df =... more

Last post by Vaibhav Mali - January 15, 2022
179 views 0 likes

2
Load CSV file with Spark

I'm new to Spark and I'm trying to read CSV data from a file with Spark. Here's what I am doing :sc.textFile('file.csv') .map(lambda line: (line.split(','), line.split(',')))... more

Last post by Maryam Bains - January 11, 2022
1,399 views 0 likes

5
How can I increase big data performance?

I am new at this concept, and still learning. I have total 10 TB json files in AWS S3, 4 instances(m3.xlarge) in AWS EC2 (1 master, 3 worker). I am currently using spark with... more

Last post by Samar Patil - January 10, 2022
580 views 0 likes

4
Lambda Architecture with Apache Spark

I'm trying to implement a Lambda Architecture using the following tools: Apache Kafka to receive all the datapoints, Spark for batch processing (Big Data), Spark Streaming for... more

Last post by Vaibhav Mali - January 6, 2022
185 views 0 likes

3
Write single CSV file using spark-csv

I am using https://github.com/databricks/spark-csv , I am trying to write a single CSV, but not able to, it is making a folder.
Need a Scala function which will take parameter... more

Last post by Samar Patil - December 28, 2021
280 views 0 likes

2
Why Import pandas in PySpark?

Hi In the University in the data science area we learned that if we wanted to work with small data we should use pandas and if we work with Big Data we schould use spark, in the... more

Last post by Samar Patil - December 28, 2021
193 views 0 likes

3
Spark Driver in Apache spark

I already have a cluster of 3 machines (ubuntu1,ubuntu2,ubuntu3 by VM virtualbox) running Hadoop 1.0.0. I installed spark on each of these machines. ub1 is my master node and the... more

Last post by Advika Banerjee - December 24, 2021
212 views 0 likes

3
How to convert rdd object to dataframe in spark

How can I convert an RDD (org.apache.spark.rdd.RDD) to a Dataframe org.apache.spark.sql.DataFrame. I converted a dataframe to rdd using .rdd. After processing it I want it back in... more

Last post by Viaan Prakash - December 22, 2021
968 views 0 likes

3
Understand Spark: Cluster Manager, Master and Driver nodes

Having read this question, I would like to ask additional questions:

The Cluster Manager is a long-running service, on which node it is... more

Last post by Maryam Bains - December 20, 2021
248 views 0 likes

3
Add jars to a Spark Job - spark-submit

True ... it has been discussed quite a lot.
However there is a lot of ambiguity and some of the answers provided ... including duplicating jar references in the... more

Last post by Maryam Bains - December 20, 2021
331 views 0 likes

3

QBoard Statistics

Topics 39

Posts 158

Total Users 7408

Active Users 17

Most Liked Topics

Most Viewed Topics

Search QBoard

How are stages split into tasks in Spark?

DataFrame equality in Apache Spark

Spark performance for Scala vs Python

Spark : how to run spark file from spark shell

How to perform union on two DataFrames with different amounts of columns in spark?

How to show full column content in a Spark Dataframe?

Load CSV file with Spark

How can I increase big data performance?

Lambda Architecture with Apache Spark

Write single CSV file using spark-csv

Why Import pandas in PySpark?

Spark Driver in Apache spark

How to convert rdd object to dataframe in spark

Understand Spark: Cluster Manager, Master and Driver nodes

Add jars to a Spark Job - spark-submit

QBoard Statistics

Connect With Us

Member Sign In

Member Sign In

Create Account

Most Liked Topics

Most Viewed Topics

Search QBoard

QBoard Statistics

Connect With Us