I wish to use Senseval-2 Coarse Sense Dataset but there is description available for the same (about the format of the dataset).
It is supposed to have the decision data i.e. whether two senses should be merged or not. Is the middle value a confidence measure? Also, they used a prerelease of Wordnet 1.7. Can I use Wordnet 1.7 for the same?
A sample from the file looks like :
material%5:00:00:physical:00 3 material%5:00:00:worldly:00
material%3:00:03:: 3 material%5:00:00:worldly:00
material%3:00:04:: 2 material%3:00:01::
material%3:00:02::
post%5:00:00:succeeding(a):00
present%3:00:01::
present%3:00:02::
present%3:01:00::
stone%3:01:00::
stone%5:00:00:chromatic:00
air%1:15:00:: 4 air%1:27:00::
air%1:19:00:: 4 air%1:27:00::
air%1:27:01:: 4 air%1:27:00::
air%1:04:00::
air%1:10:02::
air%1:07:00::
air%1:10:01::
appeal%1:04:00:: 3 appeal%1:10:00::
appeal%1:10:02:: 3 appeal%1:10:00::
Through inspection, the middle number actually describes how many senses are in the same merged sense. For example:
matrial%5:00:00:physical:00 3 material%5:00:00:worldly:00
material%3:00:03:: 3 material%5:00:00:worldly:00
basically says that there are 3 senses which is considered the same as material%5:00:00:worldly:00
, which are the two senses provided in the two lines, and the sense itself.
You can see also that there are no number for senses that do not get merged, such as air%1:04:00
, and for the sense material%3:00:04:: 2 material$2:00:01::
you can see that there are two senses. So you can do the merging by mapping the senses in the first position into the sense in the second position.