Convert Pandas Dataframe To Dask Dataframe In Python
Below, are the ways of Converting Pandas Dataframe To Dask Dataframe In Python
- Using from_pandas Function
- Using concat Function
- Using from_delayed Function
Pandas Dataframe To Dask Dataframe Using from_pandas Function
In this example, the below code imports the Pandas and Dask libraries creates a Pandas DataFrame (`pandas_df`) with two columns, and then converts it to a Dask DataFrame (`dask_df`) with 2 partitions using the `from_pandas` function.
Python
# Import Pandas and Dask import pandas as pd import dask.dataframe as dd # Create Pandas DataFrame pandas_df = pd.DataFrame({ 'A' : [ 1 , 2 , 3 ], 'B' : [ 4 , 5 , 6 ]}) # Convert to Dask DataFrame dask_df = dd.from_pandas(pandas_df, npartitions = 2 ) # Display Results print (dask_df.compute()) |
Output :
A B
0 1 4
1 2 5
2 3 6
Pandas Dataframe To Dask Dataframe Using from_delayed Function
In this example, below The code converts a Pandas DataFrame into a Dask DataFrame by splitting it into two partitions based on the index modulo 2. The result is printed after computation, displaying the Dask DataFrame with columns ‘A’ and ‘B’. Dask DataFrame dask_df
is constructed from these delayed objects using dd.from_delayed
.
Python3
import pandas as pd import dask from dask import delayed import dask.dataframe as dd # Create a Pandas DataFrame pandas_df = pd.DataFrame({ 'A' : [ 1 , 2 , 3 , 4 ], 'B' : [ 5 , 6 , 7 , 8 ], }) # Split the Pandas DataFrame into partitions partitions = [delayed(pd.DataFrame)(part) for _, part in pandas_df.groupby(pandas_df.index % 2 )] # Create a Dask DataFrame using from_delayed dask_df = dd.from_delayed(partitions) # Display the result print (dask_df.compute()) |
Output :
A B
0 1 5
2 3 7
1 2 6
3 4 8
Pandas Dataframe To Dask Dataframe Using concat Function
In this example, below code creates two Pandas DataFrames (`df1` and `df2`) and concatenates them into a Dask DataFrame `dask_df` using `dd.concat`. The result is then computed and printed, displaying the combined Dask DataFrame with columns ‘A’ and ‘B’.
Python
# Import Pandas and Dask import pandas as pd import dask.dataframe as dd # Create multiple Pandas DataFrames df1 = pd.DataFrame({ 'A' : [ 1 , 2 ], 'B' : [ 4 , 5 ]}) df2 = pd.DataFrame({ 'A' : [ 3 , 4 ], 'B' : [ 6 , 7 ]}) # Convert to Dask DataFrame using concat dask_df = dd.concat([dd.from_pandas(df1, npartitions = 2 ), dd.from_pandas(df2, npartitions = 2 )]) # Display Results print (dask_df.compute()) |
Output:
A B
0 1 4
1 2 5
0 3 6
1 4 7
Converting Pandas Dataframe To Dask Dataframe
In this article, we will delve into the process of converting a Pandas DataFrame to a Dask DataFrame in Python through several straightforward methods. This conversion is particularly crucial when dealing with large datasets, as Dask provides parallel and distributed computing capabilities, allowing for efficient handling of substantial data volumes.
What is Dask Dataframe ?
Dask is a parallel computing library in Python that allows for the efficient processing of large datasets by parallelizing operations. It provides a Dask DataFrame as a parallel and distributed alternative to the Pandas DataFrame. Converting a Pandas DataFrame to a Dask DataFrame is a common task when dealing with big data.