How to use toLocalIterator() In Python
It will return the iterator that contains all rows and columns in RDD. It is similar to the collect() method, But it is in rdd format, so it is available inside the rdd method. We can use the toLocalIterator() with rdd like:
dataframe.rdd.toLocalIterator()
For iterating the all rows and columns we are iterating this inside an for loop
Syntax:
for itertator in dataframe.rdd.toLocalIterator(): print(itertator["column_name"],...............)
where,
- dataframe is the input dataframe
- iterator is used to collect rows
- column_name is the column to iterate rows
Example: Here we are going to iterate all the columns in the dataframe with toLocalIterator() method and inside the for loop, we are specifying iterator[‘column_name’] to get column values.
Python3
# importing module import pyspark # importing sparksession from pyspark.sql module from pyspark.sql import SparkSession # creating sparksession and giving an app name spark = SparkSession.builder.appName( 'sparkdf' ).getOrCreate() # list of employee data data = [[ "1" , "sravan" , "company 1" ], [ "2" , "ojaswi" , "company 1" ], [ "3" , "rohith" , "company 2" ], [ "4" , "sridevi" , "company 1" ], [ "5" , "bobby" , "company 1" ]] # specify column names columns = [ 'ID' , 'NAME' , 'Company' ] # creating a dataframe from the lists of data dataframe = spark.createDataFrame(data, columns) # using toLocalIterator() for i in dataframe.rdd.toLocalIterator(): # display print (i[ "ID" ], i[ "NAME" ], i[ "Company" ]) |
Output:
How to Iterate over rows and columns in PySpark dataframe
In this article, we will discuss how to iterate rows and columns in PySpark dataframe.
Create the dataframe for demonstration:
Python3
# importing module import pyspark # importing sparksession from pyspark.sql module from pyspark.sql import SparkSession # creating sparksession and giving an app name spark = SparkSession.builder.appName( 'sparkdf' ).getOrCreate() # list of employee data data = [[ "1" , "sravan" , "company 1" ], [ "2" , "ojaswi" , "company 1" ], [ "3" , "rohith" , "company 2" ], [ "4" , "sridevi" , "company 1" ], [ "5" , "bobby" , "company 1" ]] # specify column names columns = [ 'ID' , 'NAME' , 'Company' ] # creating a dataframe from the lists of data dataframe = spark.createDataFrame(data, columns) dataframe.show() |
Output: