I am working in sentence segmentation project and I am searching about SRX files (Segmentation Rules Exchange) for sentence splitting. I tried to find srx (Segmentation Rules Exchange) files for sentence splitting in English, French, German, Spanish, Italian. but I failed :(
Is there any body can help me because I don't want to spend my time to write this files ?
this is an example of this file :
<languagerule languagerulename="English">
<rule break="no">
<beforebreak>\b[nN]o\.\s</beforebreak>
<afterbreak>\p{N}</afterbreak>
</rule>
<rule break="no">
<beforebreak>\b(pp|[Vv]iz|i\.?\s*e|[Vvol]|[Rr]col|maj|Lt|[Ff]ig|[Ff]igs|[Vv]iz|[Vv]ols|[Aa]pprox|[Ii]ncl|Pres|[Dd]ept|min|max|[Gg]ovt|lb|ft|c\.?\s*f|vs)\.\s</beforebreak>
<afterbreak>[^\p{Lu}]|I</afterbreak>
</rule>
LanguageTool has a file that covers those languages at https://github.com/languagetool-org/languagetool/blob/master/languagetool-core/src/main/resources/org/languagetool/resource/segment.srx (disclaimer: I'm the author of LanguageTool)