How to use collect() function In Python

In this method, we will first make a PySpark DataFrame using createDataFrame(). We will then get a list of Row objects of the DataFrame using : 

DataFrame.collect()

We will then use Python List slicing to get two lists of Rows. Finally, we convert these two lists of rows to PySpark DataFrames using createDataFrame().

Python




# Importing PySpark and Pandas
import pyspark
from pyspark.sql import SparkSession
import pandas as pd
  
# Session Creation
Spark_Session = SparkSession.builder.appName(
    'Spark Session'
).getOrCreate()
  
  
# Data filled in our DataFrame
rows = [['Lee Chong Wei', 69, 'Malaysia'],
        ['Lin Dan', 66, 'China'],
        ['Srikanth Kidambi', 9, 'India'],
        ['Kento Momota', 15, 'Japan']]
  
# Columns of our DataFrame
columns = ['Player', 'Titles', 'Country']
  
#DataFrame is created
df = Spark_Session.createDataFrame(rows, columns)
  
# getting the list of Row objects
row_list = df.collect()
  
# Slicing the Python List
part1 = row_list[:1]
part2 = row_list[1:]
  
# Converting the slices to PySpark DataFrames
slice1 = Spark_Session.createDataFrame(part1)
slice2 = Spark_Session.createDataFrame(part2)
  
# Printing the first slice
print('First DataFrame')
slice1.show()
  
# Printing the second slice
print('Second DataFrame')
slice2.show()


Output:

How to slice a PySpark dataframe in two row-wise dataframe?

In this article, we are going to learn how to slice a PySpark DataFrame into two row-wise. Slicing a DataFrame is getting a subset containing all rows from one index to another.

Similar Reads

Method 1: Using limit() and subtract() functions

In this method, we first make a PySpark DataFrame with precoded data using createDataFrame(). We then use limit() function to get a particular number of rows from the DataFrame and store it in a new variable. The syntax of limit function is :...

Method 2: Using randomSplit() function

...

Method 3: Using collect() function

In this method, we are first going to make a PySpark DataFrame using createDataFrame(). We will then use randomSplit() function to get two slices of the DataFrame while specifying the fractions of rows that will be present in both slices. The rows are split up RANDOMLY....

Method 4: Converting PySpark DataFrame to a Pandas DataFrame and using iloc[] for slicing

...