Consider we have a data frame called Pred.
It consists of 1 user per row.
The users are specified by their unique userID.
Users can be grouped by the cluster they belong to.
Users reported their Confidence and Challenge for a task, this information is saved as Conf and Chall respectively.
Note that both Conf and Chall have the same range from i.e., 1-6.
cluster userID Conf Chall
1 A 5 3
2 B 3 2
1 C 6 1
1 D 3 4
2 E 2 4
2 F 3 5
1 G 6 2
1 H 5 5
2 I 6 2
2 J 5 4
2 K 1 1
1 L 3 5
1 M 4 4
Let's say we make a scatter-plot where Conf is on x-axis and Chall is on y-axis.
The points where:
Conf == Chall
would be on the diagonal line which passes through the origin.
Now I am interested in finding the distance of each user from the diagonal line based on their coordinates:
(Conf, Chall)
Overall, the question deals with finding the distance of points (Conf, Chall) from the line at the diagonal.
Note: Please note that I am not interested in plotting the graph. I am interested in calculating a distance vector.
I understand, it is perhaps a very basic question but I have been struggling for the past few days. A simple demo example code would help me understand this problem.
I would appreciate any guidance on this!
The Euclidean distance from a point (x,y) to the diagonal line is given by
abs(x - y) / sqrt(2)
See, e.g., here. Thus, you may use
(Pred$distance <- abs(Pred$Conf - Pred$Chall) / sqrt(2))
# [1] 1.4142136 0.7071068 3.5355339 0.7071068 1.4142136 1.4142136 2.8284271
# [8] 0.0000000 2.8284271 0.7071068 0.0000000 1.4142136 0.0000000