How to use OrderBy() Function In Python
The orderBy() function sorts by one or more columns. By default, it sorts by ascending order.
Syntax: orderBy(*cols, ascending=True)
Parameters:
- cols→ Columns by which sorting is needed to be performed.
- ascending→ Boolean value to say that sorting is to be done in ascending order
Example 1: ascending for one column
Python program to sort the dataframe based on Employee ID in ascending order
Python3
# sort the dataframe based on employee I # columns in descending order dataframe.orderBy([ 'Employee_ID' ], ascending = False ).show() |
Output:
+-----------+-------------+---------+ |Employee_ID|Employee NAME| Company| +-----------+-------------+---------+ | 4| sridevi|company 1| | 4| sridevi|company 1| | 3| rohith|company 2| | 2| ojaswi|company 1| | 1| sravan|company 1| | 1| sravan|company 1| +-----------+-------------+---------+
Python program to sort the dataframe based on Employee ID in descending order
Python3
# sort the dataframe based on # Employee ID in descending order dataframe.orderBy([ 'Employee_ID' ], ascending = False ).show() |
Output:
+-----------+-------------+---------+ |Employee_ID|Employee NAME| Company| +-----------+-------------+---------+ | 4| sridevi|company 1| | 4| sridevi|company 1| | 3| rohith|company 2| | 2| ojaswi|company 1| | 1| sravan|company 1| | 1| sravan|company 1| +-----------+-------------+---------+
Example 2: Ascending multiple columns
Sort the dataframe based on employee ID and employee Name columns in descending order using orderBy.
Python3
# sort the dataframe based on employee ID # and employee Name columns in descending order dataframe.orderBy([ 'Employee ID' , 'Employee NAME' ], ascending = False ).show() |
Output:
+-----------+-------------+---------+ |Employee_ID|Employee NAME| Company| +-----------+-------------+---------+ | 4| sridevi|company 1| | 4| sridevi|company 1| | 3| rohith|company 2| | 2| ojaswi|company 1| | 1| sravan|company 1| | 1| sravan|company 1| +-----------+-------------+---------+
Sort the dataframe based on employee ID and employee Name columns in ascending order
Python3
# sort the dataframe based on employee ID # and employee Name columns in ascending order dataframe.orderBy([ 'Employee_ID' , 'Employee NAME' ], ascending = True ).show() |
Output:
+-----------+-------------+---------+ |Employee_ID|Employee NAME| Company| +-----------+-------------+---------+ | 1| sravan|company 1| | 1| sravan|company 1| | 2| ojaswi|company 1| | 3| rohith|company 2| | 4| sridevi|company 1| | 4| sridevi|company 1| +-----------+-------------+---------+
Sort the PySpark DataFrame columns by Ascending or Descending order
In this article, we are going to sort the dataframe columns in the pyspark. For this, we are using sort() and orderBy() functions in ascending order and descending order sorting.
Let’s create a sample dataframe.
Python3
# importing module import pyspark # importing sparksession from # pyspark.sql module from pyspark.sql import SparkSession # creating sparksession and giving an app name spark = SparkSession.builder.appName( 'sparkdf' ).getOrCreate() # list of employee data data = [[ "1" , "sravan" , "company 1" ], [ "2" , "ojaswi" , "company 1" ], [ "3" , "rohith" , "company 2" ], [ "4" , "sridevi" , "company 1" ], [ "1" , "sravan" , "company 1" ], [ "4" , "sridevi" , "company 1" ]] # specify column names columns = [ 'Employee_ID' , 'Employee NAME' , 'Company' ] # creating a dataframe from the lists of data dataframe = spark.createDataFrame(data, columns) # display data in the dataframe dataframe.show() |
Output:
+-----------+-------------+---------+ |Employee_ID|Employee NAME| Company| +-----------+-------------+---------+ | 1| sravan|company 1| | 2| ojaswi|company 1| | 3| rohith|company 2| | 4| sridevi|company 1| | 1| sravan|company 1| | 4| sridevi|company 1| +-----------+-------------+---------+