Big Data - Spark - Forum View

QBoard » Big Data » Big Data - Spark

Most Liked Topics

Best way to delete mil...
757 views 3 likes

4
How to choose cross-en...
764 views 1 like

4
Does changing the orde...
1,505 views 1 like

4
Any 'pretty' data visu...
1,690 views 1 like

6
Can one get hierarchic...
4,254 views 1 like

4

Search QBoard

Tune spark performance while writing big data to csv

Hi :) I have this code in spark /scala that partitions big data ( more than 50GB) by category into csv files.
df.write
.mode(SaveMode.Overwrite)
... more

Last post by Advika Banerjee - October 18, 2021
1,041 views 0 likes

3
Task not serializable: java.io.NotSerializableException when calling function outside closure only on classes not objects

Getting strange behavior when calling function outside of a closure:

when function is in a object everything is working
when function is in a class get... more

Last post by Advika Banerjee - October 18, 2021
677 views 0 likes

3
Difference between DataFrame, Dataset, and RDD in Spark

I'm just wondering what is the difference between an RDD and DataFrame (Spark 2.0.0 DataFrame is a mere type alias for Dataset) in Apache Spark?
Can you convert one to the other?

Last post by Advika Banerjee - October 16, 2021
226 views 0 likes

3
How to turn off INFO logging in Spark?

I installed Spark using the AWS EC2 guide and I can launch the program fine using the bin/pyspark script to get to the spark prompt and can also do the Quick Start quide... more

Last post by Advika Banerjee - October 9, 2021
900 views 0 likes

3
Spark Map Reduce performance on not so big data

I am doing PoC on Spark's Map Reduce performance for calculating weighted average over 5000 to 200,000 data and it appears to be very slow. So, just wanted to check whether I am... more

Last post by Samar Patil - September 11, 2021
878 views 0 likes

3
Machine Learning in Spark

I am using Apache Spark to perform sentiment analysis.I am using Naive Bayes algorithm to classify the text. I don't know how to find out the probability of labels. I would be... more

Last post by Samar Patil - September 11, 2021
1,253 views 0 likes

3
How to work on small portion of big Data File in spark?

I have got big Data file loaded in Spark but wish to work on a small portion of it to run the analysis, is there any way to do that ?. I tried doing repartition but it brings a... more

Last post by Vaibhav Mali - August 28, 2021
1,461 views 0 likes

2
advice for big data architecture: mongodb + spark

I need to implement a big data storage + processing system.
The data increases in a daily basis ( about max 50 million rows / day) , data complies of a very simple JSON document... more

Last post by Samar Patil - August 13, 2021
203 views 0 likes

2
Spark - repartition() vs coalesce()

According to Learning SparkKeep in mind that repartitioning your data is a fairly expensive operation. Spark also has an optimized version of repartition() called coalesce() that... more

Last post by Tarun Reddy - December 23, 2020
453 views 0 likes

3

QBoard Statistics

Topics 39

Posts 158

Total Users 7406

Active Users 17

Most Liked Topics

Most Viewed Topics

Search QBoard

Tune spark performance while writing big data to csv

Task not serializable: java.io.NotSerializableException when calling function outside closure only on classes not objects

Difference between DataFrame, Dataset, and RDD in Spark

How to turn off INFO logging in Spark?

Spark Map Reduce performance on not so big data

Machine Learning in Spark

How to work on small portion of big Data File in spark?

advice for big data architecture: mongodb + spark

Spark - repartition() vs coalesce()

QBoard Statistics

Connect With Us

Member Sign In

Member Sign In

Create Account

Most Liked Topics

Most Viewed Topics

Search QBoard

QBoard Statistics

Connect With Us