Search code examples
pythonpython-3.xmatplotlibgenetics

DNA sequence Dotplots


I'm currently writing a script to create a Dotplot when two sequences are given. So far I can get a lovely lil dotplot.

The X axis is: >HeaderOfSeq1

X = ATCGTAGCTACGTACGT
The Y axis is: >HeaderOfSeq2

Y = ATGCGATCGTGCTAC 

ATGCGATCGTGCTAC
===============|
\    \       \ |A
 \    \  \  \  |T
   \   \   \  \|C
  \ \   \ \    |G
 \    \  \  \  |T
\    \       \ |A
  \ \   \ \    |G
   \   \   \  \|C
 \    \  \  \  |T
\    \       \ |A
   \   \   \  \|C
  \ \   \ \    |G
 \    \  \  \  |T
\    \       \ |A
   \   \   \  \|C
  \ \   \ \    |G
 \    \  \  \  |T

This is with an --ascii filter (without that filter the / are the letters that are matched) that is also part of the script. No what I want and need to do is turn this into a matplotlib plot.

I am kinda stuck at this point, i've meshgrid from np to get two arrays with al possible combinations and I was hoping it would be fairly simple to overlap and return a contour graph maybe that essentially shows the above dot plot but just much prettier. Matplot is a requirement btw, standardisation and all that. I can't do anything with the meshgrids (that i know of anyway) due to their string format so i'm stuck.

Any help would be greatly appreciated!! I'll also post some of the actual code if needed too.


Solution

  • IIUC, you can do:

    X, Y = 'ATCGTAGCTACGTACGT', 'ATGCGATCGTGCTAC'
    X, Y = np.array(list(X)), np.array(list(Y))
    
    plt.imshow(X==Y[:,None])        # the magic happens here, contourf should work similarly
    plt.xticks(np.arange(len(X)), X)
    plt.yticks(np.arange(len(Y)), Y)
    plt.show()
    

    Output:

    enter image description here