Search code examples
pythonmatplotlibscalescatter-plot

How to scale points to have a better plot


I have few data pairs that I need to show them on a plot. for example I have this plot with test data:

enter image description here

and this is the code that I used to generate this plot:

  import matplotlib.pyplot as plt
  
  data_pairs = [
      (13, 24),
      (15, 32),
      (12, 45),
      (23, 40),
      (8, 23),
  ]
  
  y_coordinates = [5, 10, 15, 20, 25]
  
  plt.figure(figsize=(10, 5))
  
  for i, (x1, x2) in enumerate(data_pairs):
      y = y_coordinates[i]
      plt.scatter(x1,y, color='blue', marker='o', s=50)
      plt.scatter(x2,y, color='blue', marker='o', s=50)
      plt.plot([x1, x2], [y, y], color='red', linestyle='-', linewidth=2)
  
  plt.xlabel('X-Axis')
  plt.ylabel('Y-Axis')
  
  plt.grid(True)
  plt.show()

everything is fine with test data but the real data that I am using are really big. this is the part of real data:

       data_pairs = [
   (24193550, 24335121),
   (42850956, 42993424),
   (45606871, 45749886),
   (60595038, 60738084),
    (5026097, 5170030),
             ]

and the results are so crumpled together that no lines are showing:

enter image description here

How can I fix this?


Solution

  • There isn't really a way to "fix" the plot. It's showing an accurate representation of your data. Where you go from here depends on what message you want to convey with this plot. What the plot is currently showing well is how the pairs of points relate to each other in the x and y space. What's hard to discern from it is the relative distances between each pair of points. But because of the scale of the data, i.e. the distances between each point within a pair are very small compared to the distances between each of the pairs on the x-axis, there's not a way to show both of these aspects in one plot.

    Consider making multiple different plots where each highlights a specific aspect of the data.

    fig, axes = plt.subplots(nrows=len(data_pairs), figsize=(5, 8), tight_layout=True)
    
    for (ax, y, x) in zip(axes, y_coordinates, data_pairs):
    
        ax.plot(x, [y] * 2, '-ro')
        ax.set_ylim(y-1, y+1)
        ax.set_yticks([y], [y])
        ax.spines[['top', 'right']].set_visible(False)
        dist = x[1] - x[0]
        ax.annotate(xy =((x[0] + dist/2), y+0.1), text=f'Distance: {dist}')
    

    However, each x-axis scales to make the lines look the same length.

    enter image description here