I don't speak Russian, so I'm having trouble validating whether the months are correctly spelled, etc. To be honest, I'm not fully sure that my input is in Russian (Russian is the language detected by Google translate)
I have some code in Kotlin which does a best-effort to parse dates specified in various formats and languages. I'm struggling with parsing Russian dates, however. Here's the relevant part of my code:
sequenceOf(
"ru-RU", // Russian
"sr", // Serbian
).forEach {
val format = DateTimeFormatter.ofPattern("d MMM. yyyy")
.withLocale(Locale.forLanguageTag(it))
try {
return listOf(LocalDate.parse(dateString, format))
} catch (e: Exception) {
//Ignore and move on
}
}
This code correctly parses "27 апр. 2018"
and "24 мая. 2013"
, but fails on "28 фев. 2019"
.
What's special about "28 фев. 2019"
and/or how can I parse this value correctly?
If you provide answers in Java, I can translate it to Kotlin fairly easily.
EDIT: Here's an SSCCE in Kotlin:
import java.time.LocalDate
import java.time.format.DateTimeFormatter
import java.util.*
println("System.getProperty - " + System.getProperty("java.version"));
println("Runtime.version - " + Runtime.version());
val dateString = "28 фев. 2019"
sequenceOf(
"ru-RU", // Russian
"sr", // Serbian
).forEach {
val format = DateTimeFormatter.ofPattern("d MMM. yyyy")
.withLocale(Locale.forLanguageTag(it))
try {
println("Parse successful - " + LocalDate.parse(dateString, format))
} catch (e: Exception) {
println("Parse failed - " + e)
}
}
Output on my system:
System.getProperty - 17.0.4.1
Runtime.version - 17.0.4.1+7-b469.62
Parse failed - java.time.format.DateTimeParseException: Text '28 фев. 2019' could not be parsed at index 3
Parse failed - java.time.format.DateTimeParseException: Text '28 фев. 2019' could not be parsed at index 3
Since you are parsing user input, I believe, the only option is to normalize that input prior parsing it - appealing to standards is not an option there.
In Russian language we use genitive form of month names in dates (M(M)+
vs L(L)+
in java DateTimeFormat) and, normally, short forms are produced using rules below (please do not confuse that with programming standards, conventions, habits, tricks, UI/UX guides, etc):
мая.
vs мая
- the first form looks ridiculous because мая
is a full genitive form of May
, another case: июн.
vs июня
- both have the same length but июня
is a full genitive form of June
)фев.
vs февр.
й
, ь
or ъ
Based on that and taking into account possible user mistakes, typos, common sense and programming habits you may potentially face with the following "short genitive forms" of month names in the wild:
янв
, янв.
фев
, февр
, фев.
, февр.
мар
, марта
, мар.
, март.
апр
, апр.
мая
, мая.
июн
, июня
, июн.
июл
, июля
, июл.
авг
, авг.
сен
, сент
, сен.
, сент.
окт
, окт.
ноя
, нояб
, ноя.
, нояб.
дек
, дек.