How to use collect() function In Python

In this method, we will first make a PySpark DataFrame using createDataFrame(). We will then get a list of Row objects of the DataFrame using :

DataFrame.collect()

We will then use Python List slicing to get two lists of Rows. Finally, we convert these two lists of rows to PySpark DataFrames using createDataFrame().

Python

# Importing PySpark and Pandas 
import pyspark 
from pyspark.sql import SparkSession 
import pandas as pd 
  
# Session Creation 
Spark_Session = SparkSession.builder.appName( 
    'Spark Session'
).getOrCreate() 
  
  
# Data filled in our DataFrame 
rows = [['Lee Chong Wei', 69, 'Malaysia'], 
        ['Lin Dan', 66, 'China'], 
        ['Srikanth Kidambi', 9, 'India'], 
        ['Kento Momota', 15, 'Japan']] 
  
# Columns of our DataFrame 
columns = ['Player', 'Titles', 'Country'] 
  
#DataFrame is created 
df = Spark_Session.createDataFrame(rows, columns) 
  
# getting the list of Row objects 
row_list = df.collect() 
  
# Slicing the Python List 
part1 = row_list[:1] 
part2 = row_list[1:] 
  
# Converting the slices to PySpark DataFrames 
slice1 = Spark_Session.createDataFrame(part1) 
slice2 = Spark_Session.createDataFrame(part2) 
  
# Printing the first slice 
print('First DataFrame') 
slice1.show() 
  
# Printing the second slice 
print('Second DataFrame') 
slice2.show() 

Output:

How to slice a PySpark dataframe in two row-wise dataframe?

In this article, we are going to learn how to slice a PySpark DataFrame into two row-wise. Slicing a DataFrame is getting a subset containing all rows from one index to another.

How to use collect() function In Python

Python

How to slice a PySpark dataframe in two row-wise dataframe?

Categories

Contact US

How to use collect() function In Python

Python

How to slice a PySpark dataframe in two row-wise dataframe?

Similar Reads

Categories

Contact US