For an Android Studio project written in Java, I've got a List of daytimes which collects hours and minutes as integers like this:
List<Integer> times = new ArrayList<>();
int hour = 16;
int minute = 25;
int time = hour * 60 + minute;
times.add(time);
I need the mean and the standard deviation of times in order to achieve a list of non-outlier times. However, the ordinary mean and standard deviation don't seem to work. Here is what I'm doing right now:
private List<String> getNonOutlierTimes() {
int mean = convertToTime((times.stream().mapToInt(Integer::intValue).sum()) / times.size());
int sd = (int) calculateStandardDeviation(mean);
int maxTime = (int) (mean + 1.5 * sd);
int minTime = (int) (mean - 1.5 * sd);
List<Integer> nonOutliers = new ArrayList<>();
for (int i = 0; i < times.size(); i++) {
if ((times.get(i) <= maxTime) && (times.get(i) >= minTime)) {
nonOutliers.add(times.get(i));
}
}
List<String> nonOutliersStr = new ArrayList<>();
for (Integer nonOutlier : nonOutliers) {
nonOutliersStr.add(convertIntTimesToStr(nonOutlier));
}
return nonOutliersStr;
}
private int convertToTime(int a) {
if ((a < 24*60) && (a >= 0)) {
return a;
} else if (a < 0) {
return 24*60 + a;
} else {
return a % (24*60);
}
}
private double calculateStandardDeviation(int mean) {
int sum = 0;
for (int j = 0; j < times.size(); j++) {
int time = convertToTime(times.get(j));
sum = sum + ((time - mean) * (time - mean));
}
double squaredDiffMean = (double) (sum) / (times.size());
return (Math.sqrt(squaredDiffMean));
}
private String convertIntTimesToStr(int time) {
String hour = (time / 60) + "";
int minute = time % 60;
String minuteStr = minute < 10 ? "0" + minute : "" + minute;
return hour + ":" + minuteStr;
}
Although all calculations are based on valid statistics, the calculated mean and sd seem irrelevant. For example when the times list contains the following:
225 (03:45 am), 90 (01:30 am), 0 (12:00 am), 1420 (11:40 pm), 730 (12:10 pm)
I need a non-outliers list containing:
1420 (11:40 pm), 0 (12:00 am), 90 (01:30 am), 225 (03:45 am)
where the actual output is:
0 (12:00 am), 90 (01:30 am), 225 (03:45 pm), 730 (12:10 pm)
i.e., I need the mean to be where most of the times are. To be more specific, consider a list of times containing integers 1380 (23:00 or 11:00 pm), 1400 (23:20 or 11:20 pm), and 60 (01:00 am). The mean for these times is 945 (15:45 or 03:45 pm) where I need the mean to lie between 23:00 and 01:00.
I have already found this solution for a list of two times. However, my times.size() is always greater than 2 and I'd also like to calculate the standard deviation, as well. So, I appreciate your help in this regard.
Thanks in advance.
You are not working with real numbers, but with numbers modulo 1440. Division by a natural number is not well defined in this context or better n x = a
has n
solutions for each a
. E.g. 3 x = 300
has as solutions 300 / 3
, 1740 / 3
and 3180 / 3
(300
, 1740
and 3180
are different representations of the same element 300
).
Therefore you cannot talk about arithmetic mean in the context of time of the day. However the distance between two times of the day is well-defined: the distance between 21:00 and 23:00 is 2 hours as well as the distance between 23:00 and 1:00. Hence we can take another definition of "mean":
Fortunately one can prove, that this new mean is one of the solutions of n x = sum of values
. What changes between these solutions is the sum of square distances from the data and we have to choose the minimal one.
Assume we have a list of LocalTime
s:
private static final long DAY = TimeUnit.DAYS.toSeconds(1L);
private static final double HALF_DAY = DAY / 2;
private static final List<LocalTime> times = Arrays.asList(
LocalTime.of(3, 45),
LocalTime.of(1, 30),
LocalTime.of(0, 0),
LocalTime.of(23, 40),
LocalTime.of(12, 10));
We can compute the average and sum of squares in the "usual" determination (I do it in seconds so between 0 and 86400):
public static void printMeanVariance(final List<LocalTime> times) {
final List<Double> dTimes = times.stream().mapToDouble(LocalTime::toSecondOfDay).boxed().collect(Collectors.toList());
dTimes.sort(Double::compareTo);
// A valid 'mean' must have max - HALF_DAY < mean < min + HALF_DAY
double max = dTimes.get(dTimes.size() - 1);
int count = 0;
double sum = 0.0, sumOfSquares = 0.0;
for (final Double time : dTimes) {
count++;
sum += time;
sumOfSquares += time * time;
}
// to be continued...
If this is the "mean" it must satisfy two conditions:
max + DAY
and min + DAY
, where min
and max
are the minimal and maximal value in the current determination,We check these conditions for all determinations by adding every time 86400 to the minimal value:
// continuation
double average = -1;
double sumOfDistancesSquared = Double.MAX_VALUE;
for (final Double time : dTimes) {
// Check if previous value is admissible
final double tmpAverage = sum / count;
final double tmpSumOfDistancesSquared = sumOfSquares - sum * sum / count;
if (max - HALF_DAY <= tmpAverage && tmpAverage <= time + HALF_DAY && tmpSumOfDistancesSquared < sumOfDistancesSquared) {
average = tmpAverage;
sumOfDistancesSquared = tmpSumOfDistancesSquared;
}
sum += DAY;
max = time + DAY;
sumOfSquares += DAY * (2 * time + DAY);
}
// average has the "real" mean
double sd = Math.sqrt(sumOfDistancesSquared / (count - 1));
System.out.println("Mean = " + LocalTime.ofSecondOfDay((long) average) +
", deviation = " + Duration.ofSeconds((long) sd));
}
}