Search code examples
pythonpandasdataframedateplot

How to connect data points with line, where values are missing


I need to draw several biomarker changes by Date on one graph, but biomarker samples were measured in different dates and different times, so for example:

data = {
    'PatientID': [244651, 244651, 244651, 244651, 244652, 244653, 244651],
    'LocationType': ['IP', 'IP', 'OP', 'IP', 'IP', 'OP', 'IP'],
    'Date': ['2023-01-01', '2023-01-02', '2023-01-03', '2023-01-04', '2023-01-01', '2023-01-01', '2023-01-05'],
    'Biomarker1': [1.1, 1.2, None, 1.4, 2.1, 3.1, 1.5],
    'Biomarker2': [2.1, None, 2.3, 2.4, 3.1, 4.1, 2.5],
    'Biomarker3': [3.1, 3.2, 3.3, None, 4.1, 5.1, 3.5]
}

to draw a graph:

# Set the date as the index
filtered_df.set_index('Date', inplace=True)

# Plot all biomarkers
plt.figure(figsize=(12, 8))

# Loop through each biomarker column to plot
for column in filtered_df.columns:
    if column not in ['PatientID', 'LocationType']:
        plt.plot(filtered_df.index, filtered_df[column], marker='o', linestyle='-', label=column)

here is my output: Biomarker change over time

I need all the point of one biomarkers to be connected just with the line. I cannot use interpolate, the points should be just connected with line.

How do I do it? Please, help!

I tried to interpolate, but it creates new points, I don't need new points.

Here is the full code:

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

# Sample DataFrame (replace this with your actual DataFrame)
data = {
    'PatientID': [244651, 244651, 244651, 244651, 244652, 244653, 244651],
    'LocationType': ['IP', 'IP', 'OP', 'IP', 'IP', 'OP', 'IP'],
    'Date': ['2023-01-01', '2023-01-02', '2023-01-03', '2023-01-04', '2023-01-01', '2023-01-01', '2023-01-05'],
    'Biomarker1': [1.1, 1.2, None, 1.4, 2.1, 3.1, 1.5],
    'Biomarker2': [2.1, None, 2.3, 2.4, 3.1, 4.1, 2.5],
    'Biomarker3': [3.1, 3.2, 3.3, None, 4.1, 5.1, 3.5]
}

# Create DataFrame
df = pd.DataFrame(data)
df['Date'] = pd.to_datetime(df['Date'])

# Filter the data for the specified patient ID and IP location type
filtered_df = df[(df['PatientID'] == 244651) & (df['LocationType'] == 'IP')]

# Set the date as the index
filtered_df.set_index('Date', inplace=True)

# Plot all biomarkers
plt.figure(figsize=(12, 8))

# Loop through each biomarker column to plot each one separately
for column in filtered_df.columns:
    if column not in ['PatientID', 'LocationType']:
        plt.plot(filtered_df.index, filtered_df[column], marker='o', linestyle='-', label=column)

plt.title('Biomarkers by Date for Patient ID 244651 (IP Location Type)')
plt.xlabel('Date')
plt.ylabel('Biomarker Value')
plt.legend()
plt.grid(True)
plt.xticks(rotation=45)
plt.show()

Solution

  • You can replace the code creating the plot with the following:

    # Plot all biomarkers
    plt.figure(figsize=(12, 8))
    
    # Loop through each biomarker column to plot each one separately
    for column in filtered_df.columns:
        if column not in ['PatientID', 'LocationType']:
            biomarker = filtered_df[column].dropna()
            plt.plot(biomarker.index, biomarker, 'o-', label=column)
    
    plt.title('Biomarkers by Date for Patient ID 244651 (IP Location Type)')
    plt.xlabel('Date')
    plt.ylabel('Biomarker Value')
    plt.legend()
    plt.grid(True)
    plt.xticks(rotation=45)
    plt.show()
    

    Alternatively, you can use seaborn:

    import seaborn as sns
    
    # Plot all biomarkers
    plt.figure(figsize=(12, 8))
    sns.lineplot(data = filtered_df[['Biomarker1', 'Biomarker2', 'Biomarker3']],
                 markers=['o', 'o', 'o'],
                 dashes=False
                 )
    
    plt.title('Biomarkers by Date for Patient ID 244651 (IP Location Type)')
    plt.ylabel('Biomarker Value')
    plt.grid(True)
    plt.xticks(rotation=45)
    plt.show()
    

    In either case, the plot looks as follows:

    enter image description here