Write single CSV file using spark-csv

QBoard » Big Data » Big Data - Spark » Write single CSV file using spark-csv

User Dashboard

Write single CSV file using spark-csv

Back To Topics

Tags : apache-spark csv scala spark-csv

Maryam Bains

317

I am using https://github.com/databricks/spark-csv , I am trying to write a single CSV, but not able to, it is making a folder.

Need a Scala function which will take parameter like path and file name and write that CSV file.

December 23, 2021 2:13 PM IST

0

Advika Banerjee

319 1

If you are running Spark with HDFS, I've been solving the problem by writing csv files normally and leveraging HDFS to do the merging. I'm doing that in Spark (1.6) directly:

import org.apache.hadoop.conf.Configuration
import org.apache.hadoop.fs._

def merge(srcPath: String, dstPath: String): Unit =  {
   val hadoopConfig = new Configuration()
   val hdfs = FileSystem.get(hadoopConfig)
   FileUtil.copyMerge(hdfs, new Path(srcPath), hdfs, new Path(dstPath), true, hadoopConfig, null) 
   // the "true" setting deletes the source files once they are merged into the new output
}


val newData = << create your dataframe >>


val outputfile = "/user/feeds/project/outputs/subject"  
var filename = "myinsights"
var outputFileName = outputfile + "/temp_" + filename 
var mergedFileName = outputfile + "/merged_" + filename
var mergeFindGlob  = outputFileName

    newData.write
        .format("com.databricks.spark.csv")
        .option("header", "false")
        .mode("overwrite")
        .save(outputFileName)
    merge(mergeFindGlob, mergedFileName )
    newData.unpersist()

Can't remember where I learned this trick, but it might work for you.

December 24, 2021 1:19 PM IST

Samar Patil

346 3
I'm using this in Python to get a single file:
```
df.toPandas().to_csv("/tmp/my.csv", sep=',', header=True, index=False)
```
December 28, 2021 12:20 PM IST

0

Cluzters.ai

Cluzters.ai is the first step towards uniting various Industry participants in the field of Applied Data Innovations. It is a gamified community geared towards creating a level playing turf for Data science professionals.

Member Sign In

Member Sign In

Create Account

Write single CSV file using spark-csv

Connect With Us