Search code examples
pythoncluster-analysisvisualization

Visualize and clustering


Earlier on i post a question about visualization and clustering. I guess my question was not quite clear enough so I post it again. I hope i make a better explanation this time . I also apologize for not "accept answer" for my old questions. I didn't know i can do that until a guy point it out. I will definitely do it from now on.

Okay. Back to the question. Previously i have written a python script to calculate the similarity between document. Now i have all the data write to notepad and it looks like this:

(1, 6821): inf

(1, 8): 3.458911570

(1, 9): 7.448105193

(1, 10): inf

(1, 11): inf

(6821, 8): inf

(6821, 9): inf

(6821, 10): inf

(6821, 11): inf

(8, 9): 2.153308936

(8, 10): inf

(8, 11): 16.227647992

(9, 10): inf

(9, 11): 34.943139430

(10, 11): inf

The number in the parenthesis represents document numbers. And the value after it, is the distance between the two documents. What i want is actually visualization tools or method which i can create nodes that represent each documents number. For example here, i have 6 different documents. So i wish to create 6 different nodes that represent my document numbers. Then, i want to have edges that connect these nodes together based on their distances. For example the distance between document 1 and 8 is 3.46 while the distance between document 1 and 9 is 7.45. So, 1 & 8 need to cluster closer than 1 & 9. While the document pairs with 'inf' distance shouldn't have any connection or edge connecting them together.

This sounds easy but i have really hard time finding an open source visualization tool that can effective help me to perform this. I appreciate any suggestion recommendation.


Solution

  • http://www.graphviz.org/

    In particular, the neato package:

    $ cat similar.dot
    graph g {
       n1 -- n8 [ weight = 3.458911570 ];
       n1 -- n9 [ weight = 7.448105193 ];
       n8 -- n9 [ weight = 2.153308936 ];
       n8 -- n11 [ weight = 16.227647992 ];
       n9 -- n11 [ weight = 34.943139430 ];
       n10;
       n6821;
    }
    $ neato -Tpng similar.dot -o similar.png