RDD and DataFrame in Spark
RDD and DataFrame are Spark’s two primary methods for handling data.
- RDD is like the basic building block for processing data, while DataFrame is more like using SQL.
- Sometimes in projects, there is a need to switch between RDDs and DataFrames.
Below is the Scala program to setup a spark session and create a dataset:
import org.apache.spark.sql.{SparkSession, Row}
import org.apache.spark.sql.types._
val spark = SparkSession.builder().master("local").appName("RDDExample").getOrCreate()
val sc = spark.sparkContext
val rdd = sc.parallelize(Seq(
("Alice", "HR Manager", 40),
("Bob", "Software Developer", 35),
("Charlie", "Data Scientist", 28)
))
Output:
How to Convert RDD to Dataframe in Spark Scala?
This article focuses on discussing ways to convert rdd to dataframe in Spark Scala.
Table of Content
- RDD and DataFrame in Spark
- Convert Using createDataFrame Method
- Conversion Using toDF() Implicit Method
- Conclusion
- FAQs