How to use withColumn() method. In Python
Here we are using withColumn() method to select the columns.
Syntax: dataframe.withColumn(“string_column”, when(col(“column”)==’value’, 1)).otherwise(value))
Where
- dataframe is the pyspark dataframe
- string_column is the column to be mapped to numeric
- value is the numeric value
Example: Here we are going to create a college spark dataframe using Row method and map college name with college number using with column method along with when().
Python3
# import col and when modules from pyspark.sql.functions import col, when # map college name with college number # using with column method along with when module dataframe.withColumn( "college_number" , when(col( "college" ) = = 'iit' , 1 ) .when(col( "college" ) = = 'vignan' , 2 ) .when(col( "college" ) = = 'rvrjc' , 3 ) .otherwise( 4 )).show() |
Output:
Pyspark Dataframe – Map Strings to Numeric
In this article, we are going to see how to convert map strings to numeric.
Creating dataframe for demonstration:
Here we are creating a row of data for college names and then pass the createdataframe() method and then we are displaying the dataframe.
Python3
# importing module import pyspark # importing sparksession from pyspark.sql module and Row module from pyspark.sql import SparkSession,Row # creating sparksession and giving an app name spark = SparkSession.builder.appName( 'sparkdf' ).getOrCreate() # list of college data dataframe = spark.createDataFrame([Row( "vignan" ), Row( "rvrjc" ), Row( "klu" ), Row( "rvrjc" ), Row( "klu" ), Row( "vignan" ), Row( "iit" )], [ "college" ]) # display dataframe dataframe.show() |
Output: