How to usemap() in Python

In this method, we will use map() function, which returns a new vfrom a given dataframe or RDD. The map() function is used with the lambda function to iterate through each row of the pyspark Dataframe.

For looping through each row using map() first we have to convert the PySpark dataframe into RDD because map() is performed on RDD’s only, so first convert into RDD it then use map() in which, lambda function for iterating through each row and stores the new RDD in some variable then convert back that new RDD into Dataframe using toDF() by passing schema into it.

Syntax:

rdd=dataframe.rdd.map(lambda loop: (
      loop["column1"],...,loop["columnn"]) )
 rdd.toDF(["column1",.......,"columnn"]).collect()

Example: Here we are going to iterate ID and NAME column

Python3

# importing module
import pyspark
 
# importing sparksession from pyspark.sql module
from pyspark.sql import SparkSession
 
# creating sparksession and giving an app name
spark = SparkSession.builder.appName('sparkdf').getOrCreate()
 
# list  of employee data
data = [["1", "sravan", "company 1"],
        ["2", "ojaswi", "company 1"],
        ["3", "rohith", "company 2"],
        ["4", "sridevi", "company 1"],
        ["5", "bobby", "company 1"]]
 
# specify column names
columns = ['ID', 'NAME', 'Company']
 
# creating a dataframe from the lists of data
dataframe = spark.createDataFrame(data, columns)
 
# select id and name column using map()
rdd = dataframe.rdd.map(lambda loop: (
    loop["ID"], loop["NAME"]))
 
# convert to dataframe and display
rdd.toDF(["ID", "NAME"]).collect()

Output:

[Row(ID='1', NAME='sravan'),
Row(ID='2', NAME='ojaswi'),
Row(ID='3', NAME='rohith'),
Row(ID='4', NAME='sridevi'),
Row(ID='5', NAME='bobby')]

How to Iterate over rows and columns in PySpark dataframe

In this article, we will discuss how to iterate rows and columns in PySpark dataframe.

Create the dataframe for demonstration:

Python3

# importing module
import pyspark
 
# importing sparksession from pyspark.sql module
from pyspark.sql import SparkSession
 
# creating sparksession and giving an app name
spark = SparkSession.builder.appName('sparkdf').getOrCreate()
 
# list  of employee data
data = [["1", "sravan", "company 1"],
        ["2", "ojaswi", "company 1"],
        ["3", "rohith", "company 2"],
        ["4", "sridevi", "company 1"],
        ["5", "bobby", "company 1"]]
 
# specify column names
columns = ['ID', 'NAME', 'Company']
 
# creating a dataframe from the lists of data
dataframe = spark.createDataFrame(data, columns)
 
dataframe.show()

Output:

How to usemap() in Python

Python3

How to Iterate over rows and columns in PySpark dataframe

Python3

Categories

Contact US

How to usemap() in Python

Python3

How to Iterate over rows and columns in PySpark dataframe

Python3

Similar Reads

Categories

Contact US