How to use type() function In Python
type() command is used to return the type of the given object.
Syntax: type(data_object)
Here, dataobject is the rdd or dataframe data.
Example 1: Python program to create data with RDD and check the type
Python3
# need to import for session creation from pyspark.sql import SparkSession # creating the spark session spark = SparkSession.builder.getOrCreate() # create an rdd with some data rdd = spark.sparkContext.parallelize([( 1 , "Sravan" , "vignan" , 98 ), ( 2 , "bobby" , "bsc" , 87 )]) # check the type using type() command print ( type (rdd)) |
Output:
<class 'pyspark.rdd.RDD'>
Example 2: Python program to create dataframe and check the type.
Python3
# importing module import pyspark # importing sparksession from pyspark.sql module from pyspark.sql import SparkSession # creating sparksession and giving an app name spark = SparkSession.builder.appName( 'sparkdf' ).getOrCreate() # list of employee data data = [[ 1 , "sravan" , "company 1" ], [ 2 , "ojaswi" , "company 1" ], [ 3 , "rohith" , "company 2" ], [ 4 , "sridevi" , "company 1" ], [ 1 , "sravan" , "company 1" ], [ 4 , "sridevi" , "company 1" ]] # specify column names columns = [ 'ID' , 'NAME' , 'Company' ] # creating a dataframe from the lists of data dataframe = spark.createDataFrame(data,columns) # check that type of # data with type() command print ( type (dataframe)) |
Output:
<class 'pyspark.sql.dataframe.DataFrame'>
How to check if something is a RDD or a DataFrame in PySpark ?
In this article we are going to check the data is an RDD or a DataFrame using isinstance(), type(), and dispatch methods.