Sweetivz
The library is mainly known for visualizing target values and comparing datasets. It is good tool for comparing different dataset like the train and test or different parts of the same dataset like (dataset divided into two categories based on a categorical feature like gender)
Key features of this library are :
- Investigates the relationship between a target value (e.g., “Survived” in the Titanic dataset) and other features.
- Visualization and Comparison: Visualizes and compares the target variable with various features to uncover patterns, trends, and associations.
- Distinct Datasets:Allows comparison between distinct datasets, such as training and test data, to assess consistency or differences in target-related characteristics.
- Intra-set Characteristics: Analyzes intra-set characteristics, like comparing the target variable across different groups (e.g., male versus female) within the dataset.
- Summary Information: Provides summary information for each feature, including its type, unique values, missing values, duplicate rows, and most frequent values.
Install the library
!pip install sweetviz
Implementation
Let us use this library to compare two subsets of our data frame(male vs female).
- Here, a FeatureConfig object is created to configure how Sweetviz analyzes features. In this specific configuration:
- The feature with the name “PassengerId” will be skipped during the analysis.
- The feature “Age” will be treated as a text feature (force_text), which means Sweetviz will consider it as a categorical feature rather than a numerical one.
- The compare_intra function is used to generate a comparative analysis report. Here’s a breakdown of the parameters:
- df: The pandas DataFrame that you want to analyze.
- df[“Sex”] == “male”: This is a condition that splits the dataset into two groups based on the “Sex” column, where the value is “male.”
- [“Male”, “Female”]: The names assigned to the two groups created by the condition.
- “Survived”: The target variable for the analysis.
- feature_config: The configuration object created earlier.
Python3
import sweetviz as sv feature_config = sv.FeatureConfig(skip = "PassengerId" , force_text = [ "Age" ]) my_report = sv.compare_intra(df, df[ "Sex" ] = = "male" , [ "Male" , "Female" ], "Survived" , feature_config) my_report.show_notebook() my_report.show_html() # Default arguments will generate to "SWEETVIZ_REPORT.html" |
Output:
Tools to Automate EDA
Exploratory Data Analysis (EDA) is a critical phase in the data analysis process, where analysts and data scientists examine and explore the characteristics and patterns within a dataset. In this article, We’ll learn how to automate this process with Python libraries.
Table of Content
- Exploratory Data Analysis
- Python Libraries for Exploratory Data Analysis
- 1. Ydata-Profiling
- 2. AutoViz
- 3. Sweetivz
- 4. Data Prep
- 5. D-Tale
- Comparing Data Exploration Libraries