Steps for Implementing Medical Analysis Using Python
Step 1: Create a Virtual Environment
This is the first step in which we will create a virtual environment by using the following commands in your terminal:
python -m venv venv
.\venv\Scripts\activate
Step 2: Installation
At first, we will install module by using the following command:
Type the following commands one by one and press Enter after each for installation:
pip install pandas
pip install matplotlib
pip install seaborn
Step 3: Import Libraries and read CSV file.
- This step imports the necessary libraries for data manipulation and visualization.
- pandas is used for data manipulation, matplotlib.pyplot for plotting graphs, %matplotlib inline for displaying plots inline in Jupyter notebooks, and seaborn for statistical visualization.
- This step reads data from a CSV file named “medical_records.csv“ into a Pandas DataFrame named data.
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
data = pd.read_csv("medical_records.csv")
Step 4: Exploring the Data and Understanding the Data Structure
- Display the first few rows of the dataset
- List the names of all columns in the DataFrame
print("First few rows:")
print(data.head())
print("Column names:")
print(data.columns)
Output:
First few rows:
Patient_ID Age Gender Height_cm Weight_kg Blood_Pressure_Systolic \
0 1 10 Male 152 160.8 131
1 2 111 Female 172 31.1 134
2 3 97 Female 159 24.7 147
3 4 108 Female 146 61.3 159
4 5 8 Male 172 112.9 132
Blood_Pressure_Diastolic Disease_Category
0 91 Healthy
1 100 Nephrology
2 87 Hypertension
3 99 Stroke
4 92 Healthy
Column names:
Index(['Patient_ID', 'Age', 'Gender', 'Height_cm', 'Weight_kg',
'Blood_Pressure_Systolic', 'Blood_Pressure_Diastolic',
'Disease_Category'],
dtype='object')
Step 5: Understanding the Statistics
Calculate and display various statistics about the data:
- It prints Summary statistics: as a label. Then, it uses data.describe(include=’all’) to calculate and display various statistics about the data.
- This includes: For numerical columns (like age, blood pressure): mean, standard deviation, minimum, and maximum values.
- For categorical columns (like diagnosis): counts of each category. The include=’all’ argument ensures it summarizes both types of data.
print("\nSummary statistics:")
print(data.describe(include='all'))
Output:
Summary statistics:
Patient_ID Age Gender Height_cm Weight_kg \
count 1000.000000 1000.000000 1000 1000.000000 1000.000000
unique NaN NaN 2 NaN NaN
top NaN NaN Male NaN NaN
freq NaN NaN 508 NaN NaN
mean 500.500000 59.068000 NaN 161.067000 83.054200
std 288.819436 33.144591 NaN 9.982392 46.915766
min 1.000000 1.000000 NaN 145.000000 2.000000
25% 250.750000 29.000000 NaN 152.000000 42.100000
50% 500.500000 60.500000 NaN 161.000000 83.150000
75% 750.250000 87.000000 NaN 170.000000 122.850000
max 1000.000000 115.000000 NaN 178.000000 166.800000
Blood_Pressure_Systolic Blood_Pressure_Diastolic Disease_Category
count 1000.000000 1000.000000 1000
unique NaN NaN 7
top NaN NaN Stroke
freq NaN NaN 164
mean 134.509000 87.583000 NaN
std 14.757564 10.521398 NaN
min 110.000000 70.000000 NaN
25% 122.000000 78.000000 NaN
50% 135.000000 88.000000 NaN
75% 147.000000 96.000000 NaN
max 160.000000 105.000000 NaN
Step 6: Filtering Data for Stroke Patients
Filter the data to include only patients diagnosed with Stroke:
- This focuses on a specific group within the data creating a new DataFrame named stroke_data.
- This new DataFrame only includes rows from the original data where the value in the ‘Disease_Category‘ column is ‘Stroke’.
In simpler terms, it filters the data to keep only information about patients diagnosed with Stroke.
stroke_data = data[data['Disease_Category'] == 'Stroke']
print("Stroke data summary:")
print(stroke_data.describe(include='all'))
Output:
Stroke data summary:
Patient_ID Age Gender Height_cm Weight_kg \
count 164.000000 164.000000 164 164.000000 164.000000
unique NaN NaN 2 NaN NaN
top NaN NaN Male NaN NaN
freq NaN NaN 84 NaN NaN
mean 474.170732 57.097561 NaN 160.817073 84.425000
std 289.189403 32.856487 NaN 9.818760 46.449049
min 4.000000 2.000000 NaN 145.000000 4.900000
25% 228.250000 26.750000 NaN 151.000000 45.050000
50% 443.000000 57.500000 NaN 160.000000 84.250000
75% 715.250000 86.000000 NaN 169.000000 130.200000
max 993.000000 115.000000 NaN 178.000000 164.700000
Blood_Pressure_Systolic Blood_Pressure_Diastolic Disease_Category
count 164.000000 164.000000 164
unique NaN NaN 1
top NaN NaN Stroke
freq NaN NaN 164
mean 133.853659 87.006098 NaN
std 14.499521 10.837051 NaN
min 110.000000 70.000000 NaN
25% 121.000000 78.000000 NaN
50% 135.000000 87.500000 NaN
75% 144.250000 96.250000 NaN
max 160.000000 105.000000 NaN
Step 6: Grouping and Analyzing by Disease Category
- Organize by Disease: Imagine sorting patients by their disease type.
- Each Disease: It calculates average age, blood pressure spread, and other summaries for each disease group, This information is stored in a new table named disease_stats. See the Breakdown: It shows you the disease_stats table, which offers a quick overview of how different diseases affect various medical aspects.
disease_stats = data.groupby('Disease_Category').describe(include='all')
print("Statistics by Disease Category:")
print(disease_stats)
Output:
Statistics by Disease Category:
Patient_ID \
count unique top freq mean std min
Disease_Category
Cancer 131.0 NaN NaN NaN 496.274809 287.514790 9.0
Diabetes 140.0 NaN NaN NaN 517.085714 286.847401 6.0
Healthy 135.0 NaN NaN NaN 527.970370 299.616146 1.0
Heart Disease 153.0 NaN NaN NaN 517.692810 291.075960 8.0
Hypertension 137.0 NaN NaN NaN 474.525547 272.902808 3.0
Nephrology 140.0 NaN NaN NaN 498.850000 294.888954 2.0
Stroke 164.0 NaN NaN NaN 474.170732 289.189403 4.0
... Blood_Pressure_Diastolic \
25% 50% 75% ... unique top
Disease_Category ...
Cancer 264.00 490.0 756.00 ... NaN NaN
Diabetes 251.75 515.0 764.25 ... NaN NaN
Healthy 262.50 560.0 811.50 ... NaN NaN
Heart Disease 285.00 518.0 755.00 ... NaN NaN
Hypertension 243.00 459.0 676.00 ... NaN NaN
Nephrology 209.75 513.5 750.25 ... NaN NaN
Stroke 228.25 443.0 715.25 ... NaN NaN
freq mean std min 25% 50% 75% max
Disease_Category
Cancer NaN 88.015267 9.516774 71.0 81.0 88.0 95.00 105.0
Diabetes NaN 87.671429 10.717056 70.0 78.0 87.5 96.00 105.0
Healthy NaN 86.711111 10.865437 70.0 77.0 88.0 95.50 105.0
Heart Disease NaN 87.869281 10.699049 70.0 78.0 90.0 97.00 105.0
Hypertension NaN 86.481752 9.904026 70.0 78.0 86.0 94.00 105.0
Nephrology NaN 89.371429 10.841794 70.0 78.0 89.0 100.00 105.0
Stroke NaN 87.006098 10.837051 70.0 78.0 87.5 96.25 105.0
[7 rows x 77 columns]
Step 7: Exploratory Data Analysis
1. Visualizing Age Distribution by Disease Category (Box Plot)
- Box Plot of Age by Disease: This line creates a box plot showing how age is distributed across different disease categories.
- What’s on the Plot? X-axis (horizontal): Shows disease categories. Y-axis (vertical): Shows age. Data Used: The entire dataset (data) is used to create the plot.
- Labels and Title: Title: “Distribution of Age by Disease Category” (explains the plot). Labels: X-axis – “Disease Category”, Y-axis – “Age”. Rotating Labels (Optional): If there are many disease categories, labels on the X-axis might be rotated for better readability.
sns.boxplot(
x = "Disease_Category",
y = "Age",
data=data
)
plt.title('Distribution of Age by Disease Category')
plt.xlabel('Disease Category')
plt.ylabel('Age')
plt.xticks(rotation=45)
plt.show()
Output:
2. Visualizing Age Distribution by Disease Category (Violin Plot)
- Creates a violin plot using seaborn.violinplot.
- Uses “Disease_Category” on the x-axis and “Age” on the y-axis.
- Sets the plot title and labels for better understanding.
- Optionally rotates x-axis labels for readability if there are many disease categories.
- Displays the created violin plot.
sns.violinplot(
x = "Disease_Category",
y = "Age",
data=data
)
plt.title('Distribution of Age by Disease Category (Violin Plot)')
plt.xlabel('Disease Category')
plt.ylabel('Age')
plt.xticks(rotation=45)
plt.show()
Output:
In this Python-based medical analysis, we successfully explored a dataset containing patient records. Key findings include:
- Statistical Overview: The data revealed summary statistics for various patient characteristics, including age, height, weight, and blood pressure measurements. We observed a diverse patient population across different disease categories.
- Disease-Specific Focus: By filtering for stroke patients, we gained deeper insights into this particular group. We found their average age to be [mention average age from stroke_data].
- Disease Comparison: The disease-specific grouping allowed us to compare different conditions. We observed variations in average age and blood pressure across different diseases. Notably, the “Stroke” category demonstrated a slightly lower average age compared to the overall dataset.
- Visualizations: The box plots and violin plots effectively illustrated the age distribution across diseases, highlighting differences and potential areas for further investigation.
Medical Analysis Using Python: Revolutionizing Healthcare with Data Science
In recent years, the intersection of healthcare and technology has given rise to groundbreaking advancements in medical analysis. Imagine a doctor faced with lots of patient information and records, searching for clues to diagnose complex disease? Analysing this data is like putting together a medical puzzle, and it’s important for doctors to see the bigger picture. This is where medical analytics apps come into play. Medical analytics apps make this process much easier.
Among the various tools and programming languages available, Python has emerged as a powerful ally for medical professionals and researchers. In this article, learn how to build a Python-based app, known for its user-friendliness and versatility, for analysing medical data and uncovering patterns.
Table of Content
- Essential Libraries for Medical Analysis
- Steps for Implementing Medical Analysis Using Python
- Step 1: Create a Virtual Environment
- Step 2: Installation
- Step 3: Import Libraries and read CSV file.
- Step 4: Exploring the Data and Understanding the Data Structure
- Step 5: Understanding the Statistics
- Step 6: Filtering Data for Stroke Patients
- Step 6: Grouping and Analyzing by Disease Category
- Step 7: Exploratory Data Analysis
- Use Cases and Real-World Examples of Medical Analysis
- Future Prospects for Revolutionizing Healthcare