Iteration through Row list
In this method, we will traverse through the Row list, and convert each row object to a DataFrame using createDataFrame(). We will then append() this DataFrame to an accumulative final DataFrame which will be our final answer. The details of append() are given below :
Syntax: df.append(other, ignore_index=False, verify_integrity=False, sort=None)
df : Pandas DataFrame
Parameters :
- other : Pandas DataFrame, Numpy Array, Numpy Series etc.
- ignore_index : Checks if index labels are to be used or not.
- verify_integrity : If True, raise ValueError on creating index with duplicates.
- sort : Sort columns if the columns of df and other are unaligned.
Returns: A new appended DataFrame
Example:
In this example, we will then use createDataFrame() to create a PySpark DataFrame and then use append() to get a Pandas DataFrame.
Python
# Importing PySpark # Importing Pandas for append() import pyspark import pandas from pyspark.sql import SparkSession from pyspark.sql import Row # PySpark Session row_pandas_session = SparkSession.builder.appName( 'row_pandas_session' ).getOrCreate() # List of Sample Row objects row_object_list = [Row(Topic = 'Dynamic Programming' , Difficulty = 10 ), Row(Topic = 'Arrays' , Difficulty = 5 ), Row(Topic = 'Sorting' , Difficulty = 6 ), Row(Topic = 'Binary Search' , Difficulty = 7 )] # Our final DataFrame initialized mega_df = pandas.DataFrame() # Traversing through the list for i in range ( len (row_object_list)): # Creating a Spark DataFrame of a single row small_df = row_pandas_session.createDataFrame([row_object_list[i]]) # appending the Pandas version of small_df # to mega_df mega_df = mega_df.append(small_df.toPandas(), ignore_index = True ) # Printing our desired DataFrame print (mega_df) |
Output :
Convert PySpark Row List to Pandas DataFrame
In this article, we will convert a PySpark Row List to Pandas Data Frame. A Row object is defined as a single Row in a PySpark DataFrame. Thus, a Data Frame can be easily represented as a Python List of Row objects.