How to use toPandas() In Python
Used to convert a column to dataframe, and then we can convert it into a list.
Syntax: list(dataframe.select(‘column_name’).toPandas()[‘column_name’])
Where,
- toPandas() is used to convert particular column to dataframe
- column_name is the column in the pyspark dataframe
Example: Convert pyspark dataframe columns to list using toPandas() method
Python3
# display college column in # the list format using toPandas print ( list (dataframe.select( 'college' ). toPandas()[ 'college' ])) # display student NAME column in # the list format using toPandas print ( list (dataframe.select( 'student NAME' ). toPandas()[ 'student NAME' ])) # display subject1 column in # the list format using toPandas print ( list (dataframe.select( 'subject1' ). toPandas()[ 'subject1' ])) # display subject2 column # in the list format using toPandas print ( list (dataframe.select( 'subject2' ). toPandas()[ 'subject2' ])) |
Output:
[‘vignan’, ‘vvit’, ‘vvit’, ‘vignan’, ‘vignan’, ‘iit’]
[‘sravan’, ‘ojaswi’, ‘rohith’, ‘sridevi’, ‘sravan’, ‘gnanesh’]
[67, 78, 100, 78, 89, 94]
[89, 89, 80, 80, 98, 98]
Converting a PySpark DataFrame Column to a Python List
In this article, we will discuss how to convert Pyspark dataframe column to a Python list.
Creating dataframe for demonstration:
Python3
# importing module import pyspark # importing sparksession from pyspark.sql module from pyspark.sql import SparkSession # creating sparksession and giving an app name spark = SparkSession.builder.appName( 'sparkdf' ).getOrCreate() # list of students data data = [[ "1" , "sravan" , "vignan" , 67 , 89 ], [ "2" , "ojaswi" , "vvit" , 78 , 89 ], [ "3" , "rohith" , "vvit" , 100 , 80 ], [ "4" , "sridevi" , "vignan" , 78 , 80 ], [ "1" , "sravan" , "vignan" , 89 , 98 ], [ "5" , "gnanesh" , "iit" , 94 , 98 ]] # specify column names columns = [ 'student ID' , 'student NAME' , 'college' , 'subject1' , 'subject2' ] # creating a dataframe from the lists of data dataframe = spark.createDataFrame(data, columns) # display dataframe dataframe.show() |
Output: