Random sampling with replacement

Random sampling with replacement is a type of random sampling in which the previous randomly chosen element is returned to the population and now a random element is picked up randomly.

Syntax:

sample(True, fraction, seed)

Here,

fraction: It represents the fraction of rows to be generated. It might range from 0.0 to 1.0 (inclusive)

seed: It represents the seed required sampling (By default it is a random seed). It is used to regenerate the same random sampling.

Example:

Python3

# Python program to demonstrate random
# sampling in pyspark with replacement
 
# Import libraries
import pandas as pd
from pyspark.sql import Row
from pyspark.sql import SparkSession
 
 
# Create a session
spark = SparkSession.builder.getOrCreate()
 
# Create dataframe by passing list
df = spark.createDataFrame([
    Row(Brand="Redmi", Units=1000000, Performance="Outstanding", Ecofriendly="Yes"),
    Row(Brand="Samsung", Units=900000, Performance="Outstanding",  Ecofriendly="Yes"),
    Row(Brand="Nokia", Units=500000, Performance="Excellent",  Ecofriendly="Yes"),
    Row(Brand="Motorola",Units=400000, Performance="Average",  Ecofriendly="Yes"),
    Row(Brand="Apple", Units=2000000,Performance="Outstanding",  Ecofriendly="Yes")
])
 
# Apply sample() function with replacement
df_mobile_brands = df.sample(True, 0.5, 42)
 
# Print to the console
df_mobile_brands.show()

Output:

Simple random sampling and stratified sampling in PySpark

In this article, we will discuss simple random sampling and stratified sampling in PySpark.

Random sampling with replacement

Python3

Simple random sampling and stratified sampling in PySpark

Categories

Contact US

Random sampling with replacement

Python3

Simple random sampling and stratified sampling in PySpark

Similar Reads

Categories

Contact US