Sweetivz

The library is mainly known for visualizing target values and comparing datasets. It is good tool for comparing different dataset like the train and test or different parts of the same dataset like (dataset divided into two categories based on a categorical feature like gender)

Key features of this library are :

  • Investigates the relationship between a target value (e.g., “Survived” in the Titanic dataset) and other features.
  • Visualization and Comparison: Visualizes and compares the target variable with various features to uncover patterns, trends, and associations.
  • Distinct Datasets:Allows comparison between distinct datasets, such as training and test data, to assess consistency or differences in target-related characteristics.
  • Intra-set Characteristics: Analyzes intra-set characteristics, like comparing the target variable across different groups (e.g., male versus female) within the dataset.
  • Summary Information: Provides summary information for each feature, including its type, unique values, missing values, duplicate rows, and most frequent values.

Install the library

!pip install sweetviz

Implementation

Let us use this library to compare two subsets of our data frame(male vs female).

  • Here, a FeatureConfig object is created to configure how Sweetviz analyzes features. In this specific configuration:
    • The feature with the name “PassengerId” will be skipped during the analysis.
    • The feature “Age” will be treated as a text feature (force_text), which means Sweetviz will consider it as a categorical feature rather than a numerical one.
  • The compare_intra function is used to generate a comparative analysis report. Here’s a breakdown of the parameters:
    • df: The pandas DataFrame that you want to analyze.
    • df[“Sex”] == “male”: This is a condition that splits the dataset into two groups based on the “Sex” column, where the value is “male.”
    • [“Male”, “Female”]: The names assigned to the two groups created by the condition.
    • “Survived”: The target variable for the analysis.
    • feature_config: The configuration object created earlier.

Python3




import sweetviz as sv
  
feature_config = sv.FeatureConfig(skip="PassengerId", force_text=["Age"])
my_report = sv.compare_intra(df, df["Sex"] == "male", ["Male", "Female"], "Survived", feature_config)
my_report.show_notebook()
my_report.show_html() # Default arguments will generate to "SWEETVIZ_REPORT.html"


Output:

Sweetiviz Output

Tools to Automate EDA

Exploratory Data Analysis (EDA) is a critical phase in the data analysis process, where analysts and data scientists examine and explore the characteristics and patterns within a dataset. In this article, We’ll learn how to automate this process with Python libraries.

Table of Content

  • Exploratory Data Analysis
  • Python Libraries for Exploratory Data Analysis
  • 1. Ydata-Profiling
  • 2. AutoViz
  • 3. Sweetivz
  • 4. Data Prep
  • 5. D-Tale
  • Comparing Data Exploration Libraries

Similar Reads

Exploratory Data Analysis

EDA stands for Exploratory Data Analysis. With the help of various visualization methods and statistical tools, we analyze and visualize our data sets to discover any patterns, relationships, anomalies, and key insights that can help us in analysis or decision-making. It’s a comprehensive approach to understanding and summarizing the main characteristics of a dataset. We have three types of analysis. Let’s understand them one by one in-depth....

Python Libraries for Exploratory Data Analysis

There are many such libraries available. We will explore the most popular of them namely...

1. Ydata-Profiling

...

2. AutoViz

The capabilities of ydata-profiling package are :...

3. Sweetivz

...

4. Data Prep

AutoViz provides a rapid visual snapshot of the data. It’s built on top of Matplotlib and Seaborn, and it can quickly generate various charts and graphs. It provides below visuzlation...

5. D-Tale

...

Comparing Data Exploration Libraries

The library is mainly known for visualizing target values and comparing datasets. It is good tool for comparing different dataset like the train and test or different parts of the same dataset like (dataset divided into two categories based on a categorical feature like gender)...

Conclusion

...