Search QBoard Topic - Cluzters.ai

Quick Navigation

Most Rated Topics

When to use Hadoop, HB...
880 views 0 likes

3
Recommended package fo...
1,339 views 0 likes

5
Can one get hierarchic...
4,251 views 1 like

4
Best way to delete mil...
756 views 3 likes

4
How to convert text da...
498 views 1 like

3

Popular Tags

.NET² #AI¹ #apache-kafka¹ #apache-zookeeper¹ #Azure¹ #computervision¹ #ggplot2 ¹ #ggpmisc¹ #hadoop³ #hbase¹ #hdfs¹ #hive² #hiveql¹ #impala¹ #IOT¹ #LogisticRegression¹ #Model_Metrics¹ #R² #sql¹ #text_analytics¹ #visualization¹ 3d¹ abstraction¹ accessor¹ acf¹
Explore More »

How are stages split into tasks in Spark?

Let's assume for the following that only one Spark job is running at every point in time.
What I get so far
Here is what I understand what happens in... more

Last post by Advika Banerjee - February 11, 2022
557 views 0 likes

3
Spark performance for Scala vs Python

I prefer Python over Scala. But, as Spark is natively written in Scala, I was expecting my code to run faster in the Scala than the Python version for obvious reasons.With that... more

Last post by Advika Banerjee - January 17, 2022
164 views 0 likes

3
Write single CSV file using spark-csv

I am using https://github.com/databricks/spark-csv , I am trying to write a single CSV, but not able to, it is making a folder.
Need a Scala function which will take parameter... more

Last post by Samar Patil - December 28, 2021
271 views 0 likes

2
Spark Driver in Apache spark

I already have a cluster of 3 machines (ubuntu1,ubuntu2,ubuntu3 by VM virtualbox) running Hadoop 1.0.0. I installed spark on each of these machines. ub1 is my master node and the... more

Last post by Advika Banerjee - December 24, 2021
206 views 0 likes

3
Why Import pandas in PySpark?

Hi In the University in the data science area we learned that if we wanted to work with small data we should use pandas and if we work with Big Data we schould use spark, in the... more

Last post by Samar Patil - December 28, 2021
183 views 0 likes

3
Lambda Architecture with Apache Spark

I'm trying to implement a Lambda Architecture using the following tools: Apache Kafka to receive all the datapoints, Spark for batch processing (Big Data), Spark Streaming for... more

Last post by Vaibhav Mali - January 6, 2022
174 views 0 likes

3
Add jars to a Spark Job - spark-submit

True ... it has been discussed quite a lot.
However there is a lot of ambiguity and some of the answers provided ... including duplicating jar references in the... more

Last post by Maryam Bains - December 20, 2021
320 views 0 likes

3
Apache Spark: The number of cores vs. the number of executors

I'm trying to understand the relationship of the number of cores and the number of executors when running a Spark job on YARN.
The test environment is as follows:

Number of data... more

Last post by Advika Banerjee - December 6, 2021
195 views 0 likes

3
How to overwrite the output directory in spark

I have a spark streaming application which produces a dataset for every minute. I need to save/overwrite the results of the processed data.
When I tried to overwrite the dataset... more

Last post by Vaibhav Mali - December 8, 2021
237 views 0 likes

3
Spark - load CSV file as DataFrame?

I would like to read a CSV in spark and convert it as DataFrame and store it in HDFS with df.registerTempTable("table_name")I have tried:
scala> val df =... more

Last post by Vaibhav Mali - December 8, 2021
232 views 0 likes

3
Mac spark-shell Error initializing SparkContext

I tried to start spark 1.6.0 (spark-1.6.0-bin-hadoop2.4) on Mac OS Yosemite 10.10.5 using
"./bin/spark-shell".... more

Last post by Viaan Prakash - November 12, 2021
303 views 0 likes

2
Apache Spark vs Spring Cloud data flow

I'm new to big data processing and I'm reading about tools for stream processing and building data pipelines. I found Apache Spark and Spring Cloud Data Flow. I want to know the... more

Last post by Viaan Prakash - November 12, 2021
349 views 0 likes

3
DataFrame equality in Apache Spark

Assume df1 and df2 are two DataFrames in Apache Spark, computed using two different mechanisms, e.g., Spark SQL vs. the Scala/Java/Python API.Is there an idiomatic way to... more

Last post by Vaibhav Mali - February 2, 2022
216 views 0 likes

3
Apache Spark -- Java , Group Live Stream data

I am trying to get live JSON data from RabbitMQ to Apache Spark using Java and do some realtime analytics out of it.
I am able to get the data and also do some basic SQL queries... more

Last post by Viaan Prakash - November 12, 2021
187 views 0 likes

3
Spark: Best practice for retrieving big data from RDD to local machine

I've got big RDD(1gb) in yarn cluster. On local machine, which use this cluster I have only 512 mb. I'd like to iterate over values in RDD on my local machine. I can't use... more

Last post by Maryam Bains - November 9, 2021
161 views 0 likes

3
How to stop INFO messages displaying on spark console?

I'd like to stop various messages that are coming on spark shell.I tried to edit the log4j.properties file in order to stop these message.Here are the contents of... more

Last post by Advika Banerjee - December 6, 2021
216 views 0 likes

3
How to read XML files from apache spark framework?

I did come across a mini tutorial for data preprocessing using spark here: http://ampcamp.berkeley.edu/big-data-mini-course/featurization.html
However, this discusses only about... more

Last post by Viaan Prakash - December 9, 2021
199 views 0 likes

3
What is the difference between Apache Spark SQLContext vs HiveContext?

What are the differences between Apache Spark SQLContext and HiveContext ?
Some sources say that since the HiveContext is a superset of SQLContext developers should always use... more

Last post by Viaan Prakash - November 29, 2021
215 views 0 likes

4
How to show full column content in a Spark Dataframe?

I am using spark-csv to load data into a DataFrame. I want to do a simple query and display the content:
val df =... more

Last post by Vaibhav Mali - January 15, 2022
172 views 0 likes

2
How to turn off INFO logging in Spark?

I installed Spark using the AWS EC2 guide and I can launch the program fine using the bin/pyspark script to get to the spark prompt and can also do the Quick Start quide... more

Last post by Advika Banerjee - October 9, 2021
897 views 0 likes

3
Talend and Apache Spark?

I am confused as to where Talend and Apache spark fit in the big data ecosystem as both Apache Spark and Talend can be used for ETL.
Could someone please explain this with an example?

Last post by Sindhuja Martha - October 23, 2021
190 views 0 likes

4
Why Spark SQL considers the support of indexes unimportant?

Quoting the Spark DataFrames, Datasets and SQL manual:

A handful of Hive optimizations are not yet included in Spark. Some of these (such as indexes) are less important due to... more

Last post by Maryam Bains - November 17, 2021
192 views 0 likes

3
Tune spark performance while writing big data to csv

Hi :) I have this code in spark /scala that partitions big data ( more than 50GB) by category into csv files.
df.write
.mode(SaveMode.Overwrite)
... more

Last post by Advika Banerjee - October 18, 2021
1,040 views 0 likes

3
advice for big data architecture: mongodb + spark

I need to implement a big data storage + processing system.
The data increases in a daily basis ( about max 50 million rows / day) , data complies of a very simple JSON document... more

Last post by Samar Patil - August 13, 2021
203 views 0 likes

2
Spark - repartition() vs coalesce()

According to Learning SparkKeep in mind that repartitioning your data is a fairly expensive operation. Spark also has an optimized version of repartition() called coalesce() that... more

Last post by Tarun Reddy - December 23, 2020
452 views 0 likes

3

Member Sign In

Member Sign In

Create Account

Quick Navigation

Most Rated Topics

Popular Tags

Connect With Us