How to Print RDD in Scala?
Method 1: Using collect
The collect method accumulates all the values from the partitions and returns them as an array of rows. We can then print the returned array to display the rdd.
val collected_rdd = rdd.collect()
collected_rdd.foreach(println)
Output:
As seen above the rdd is printed line by line.
The collect method should be used only on small datasets and not on large datasets. This is because the method collects the data from all the partitions into the memory. Thus large datasets might not fit into the memory and cause errors.
Method 2: Using foreach
We can loop through the rdd and print each row using the foreach function. Let us try to print the dataframe using the foreach function.
rdd.foreach(println)
Output:
As seen above the rdd was printed without any issue.
Method 3: Using toDF
We can use the show function of the dataframe api of spark. For that, we need to convert the rdd to a dataframe. We will do that using the toDF function which can be imported from implicits. The show function is powerful with alot of different arguments to control the display of the dataframe. Let’s use the toDF to print our rdd.
import spark.implicits._
val df = rdd.toDF()
df.show()
Output:
As seen above, the rdd is converted into a tabular structure and the data is printed with some default column names.
How to Print RDD in scala?
Scala stands for scalable language. It was developed in 2003 by Martin Odersky. It is an object-oriented language that provides support for functional programming approach as well. Everything in scala is an object e.g. – values like 1,2 can invoke functions like toString(). Scala is a statically typed language although unlike other statically typed languages like C, C++, or Java, it doesn’t require type information while writing the code. The type verification is done at the compile time. Static typing allows to building of safe systems by default. Smart built-in checks and actionable error messages, combined with thread-safe data structures and collections, prevent many tricky bugs before the program first runs.
Table of Content
- Understanding RDD and Spark
- Building Sample RDD
- How to Print RDD in Scala?
- Conclusion