Search code examples
similarityinformation-retrievaleuclidean-distancecosine-similaritydata-retrieval

Using relative frequency for euclidean distance


How do I calculate the euclidean distance(similarity) between two documents eg D1 and D2 using relative frequency?.

Below is an example of both cosine and euclidean distance between two documents using absolute frequency.

D1 (frequencies) = 4,9,7,0,0,3. = {16+81+49+9} = sqrt (155) = 12.45

D2 (frequencies) = 4,5,0,7,5,0. = {16+25+49+25} = sqrt (115) = 10.72

Cosine D1,D2 = (4x4+9x5) / 12.45x10.72 = 0.4569 (absolute frequency & relative frequency) for cosine absolute frequency is the same as relative frequency

Also

Euclidean D1, D2 = sqrt( sqr(4-4) + sqr(9-5) + sqr(7) + sqr(7) + sqr(5) + sqr(3) ) =sqrt( 0+16+49+49+25+9) = sqrt( 148 ) = 12.17(absolute frequency).

The relative frequency for this is 0.2532.

i'm trying to get the relative frequency (euclidean) for this problem, i haven't found any tutorial that helps. all i could find only the answer 0.2532 without a formula or explanation.


Solution

  • read up on euclidean distance here to get a better understanding