I need to draw several biomarker changes by Date on one graph, but biomarker samples were measured in different dates and different times, so for example:
data = {
'PatientID': [244651, 244651, 244651, 244651, 244652, 244653, 244651],
'LocationType': ['IP', 'IP', 'OP', 'IP', 'IP', 'OP', 'IP'],
'Date': ['2023-01-01', '2023-01-02', '2023-01-03', '2023-01-04', '2023-01-01', '2023-01-01', '2023-01-05'],
'Biomarker1': [1.1, 1.2, None, 1.4, 2.1, 3.1, 1.5],
'Biomarker2': [2.1, None, 2.3, 2.4, 3.1, 4.1, 2.5],
'Biomarker3': [3.1, 3.2, 3.3, None, 4.1, 5.1, 3.5]
}
to draw a graph:
# Set the date as the index
filtered_df.set_index('Date', inplace=True)
# Plot all biomarkers
plt.figure(figsize=(12, 8))
# Loop through each biomarker column to plot
for column in filtered_df.columns:
if column not in ['PatientID', 'LocationType']:
plt.plot(filtered_df.index, filtered_df[column], marker='o', linestyle='-', label=column)
here is my output: Biomarker change over time
I need all the point of one biomarkers to be connected just with the line. I cannot use interpolate, the points should be just connected with line.
How do I do it? Please, help!
I tried to interpolate, but it creates new points, I don't need new points.
Here is the full code:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
# Sample DataFrame (replace this with your actual DataFrame)
data = {
'PatientID': [244651, 244651, 244651, 244651, 244652, 244653, 244651],
'LocationType': ['IP', 'IP', 'OP', 'IP', 'IP', 'OP', 'IP'],
'Date': ['2023-01-01', '2023-01-02', '2023-01-03', '2023-01-04', '2023-01-01', '2023-01-01', '2023-01-05'],
'Biomarker1': [1.1, 1.2, None, 1.4, 2.1, 3.1, 1.5],
'Biomarker2': [2.1, None, 2.3, 2.4, 3.1, 4.1, 2.5],
'Biomarker3': [3.1, 3.2, 3.3, None, 4.1, 5.1, 3.5]
}
# Create DataFrame
df = pd.DataFrame(data)
df['Date'] = pd.to_datetime(df['Date'])
# Filter the data for the specified patient ID and IP location type
filtered_df = df[(df['PatientID'] == 244651) & (df['LocationType'] == 'IP')]
# Set the date as the index
filtered_df.set_index('Date', inplace=True)
# Plot all biomarkers
plt.figure(figsize=(12, 8))
# Loop through each biomarker column to plot each one separately
for column in filtered_df.columns:
if column not in ['PatientID', 'LocationType']:
plt.plot(filtered_df.index, filtered_df[column], marker='o', linestyle='-', label=column)
plt.title('Biomarkers by Date for Patient ID 244651 (IP Location Type)')
plt.xlabel('Date')
plt.ylabel('Biomarker Value')
plt.legend()
plt.grid(True)
plt.xticks(rotation=45)
plt.show()
You can replace the code creating the plot with the following:
# Plot all biomarkers
plt.figure(figsize=(12, 8))
# Loop through each biomarker column to plot each one separately
for column in filtered_df.columns:
if column not in ['PatientID', 'LocationType']:
biomarker = filtered_df[column].dropna()
plt.plot(biomarker.index, biomarker, 'o-', label=column)
plt.title('Biomarkers by Date for Patient ID 244651 (IP Location Type)')
plt.xlabel('Date')
plt.ylabel('Biomarker Value')
plt.legend()
plt.grid(True)
plt.xticks(rotation=45)
plt.show()
Alternatively, you can use seaborn:
import seaborn as sns
# Plot all biomarkers
plt.figure(figsize=(12, 8))
sns.lineplot(data = filtered_df[['Biomarker1', 'Biomarker2', 'Biomarker3']],
markers=['o', 'o', 'o'],
dashes=False
)
plt.title('Biomarkers by Date for Patient ID 244651 (IP Location Type)')
plt.ylabel('Biomarker Value')
plt.grid(True)
plt.xticks(rotation=45)
plt.show()
In either case, the plot looks as follows: