How to load Breast cancer wisconsin (diagnostic) dataset?
The sklearn.datasets.load_breast_cancer
function is used to load the Breast Cancer Wisconsin dataset.
Syntax: sklearn.datasets.load_breast_cancer(*, return_X_y=False, as_frame=False)
Here’s what each parameter does:
return_X_y
:
- When set to True: The function provides the features (X) and the target labels (y) as distinct arrays.
- When set to False (default): The function returns a Bunch object containing both the data and target labels together.
as_frame
:
- When set to True: The data is returned in the form of a pandas DataFrame.
- When set to False (default): The data is returned as either a numpy array or a Bunch object, depending on the value of return_X_y.
Loading Breast Cancer Dataset using Sklearn
We will be loading the breast cancer dataset from sklearn, by converting it into a pandas DataFrame, and then displaying the first few rows.
import pandas as pd
from sklearn.datasets import load_breast_cancer
# Load breast cancer dataset from sklearn
data = load_breast_cancer()
# Convert the dataset to a pandas DataFrame
df = pd.DataFrame(data.data, columns=data.feature_names)
# Add the target variable to the DataFrame
df['target'] = data.target
# Display the DataFrame
print(df.head())
Output:
mean radius mean texture mean perimeter mean area mean smoothness \
0 17.99 10.38 122.80 1001.0 0.11840
1 20.57 17.77 132.90 1326.0 0.08474
2 19.69 21.25 130.00 1203.0 0.10960
3 11.42 20.38 77.58 386.1 0.14250
4 20.29 14.34 135.10 1297.0 0.10030
mean compactness mean concavity mean concave points mean symmetry \
0 0.27760 0.3001 0.14710 0.2419
1 0.07864 0.0869 0.07017 0.1812
2 0.15990 0.1974 0.12790 0.2069
3 0.28390 0.2414 0.10520 0.2597
4 0.13280 0.1980 0.10430 0.1809
mean fractal dimension ... worst texture worst perimeter worst area \
0 0.07871 ... 17.33 184.60 2019.0
1 0.05667 ... 23.41 158.80 1956.0
2 0.05999 ... 25.53 152.50 1709.0
3 0.09744 ... 26.50 98.87 567.7
4 0.05883 ... 16.67 152.20 1575.0
worst smoothness worst compactness worst concavity worst concave points \
0 0.1622 0.6656 0.7119 0.2654
1 0.1238 0.1866 0.2416 0.1860
2 0.1444 0.4245 0.4504 0.2430
3 0.2098 0.8663 0.6869 0.2575
4 0.1374 0.2050 0.4000 0.1625
worst symmetry worst fractal dimension target
0 0.4601 0.11890 0
1 0.2750 0.08902 0
2 0.3613 0.08758 0
3 0.6638 0.17300 0
4 0.2364 0.07678 0
[5 rows x 31 columns]
Breast Cancer Wisconsin (Diagnostic) Dataset
The Breast Cancer Wisconsin (Diagnostic) dataset is a renowned collection of data used extensively in machine learning and medical research. Originating from digitized images of fine needle aspirates (FNA) of breast masses, this dataset facilitates the analysis of cell nuclei characteristics to aid in the diagnosis of breast cancer. In this article, we delve into the attributes, statistics, and significance of this dataset.