I have a pretty big dataframe, called FTSE. Here his structure.
str(FTSE)
'data.frame': 21167 obs. of 5 variables:
$ Name : Factor w/ 2 levels "FTSE MIB","FTSE MIB NET TOT ": 1 1 1 1 1 1 1 1 1 1 ...
$ DateLastTrade: Factor w/ 18 levels "12/10/13","12/11/13",..: 9 9 9 9 9 9 9 9 9 9 ...
$ LastPrice : num 19091 19008 19002 19018 19018 ...
$ Open : num 19091 19091 19091 19091 19091 ...
$ LastClose : num 19021 19021 19021 19021 19021 ...
I tried to summarize it, I've obtained:
summary(FTSE)
Name DateLastTrade LastPrice Open LastClose
FTSE MIB :10289 12/3/13 : 1370 Min. :17750 Min. :17811 Min. :17805
FTSE MIB NET TOT :10878 12/4/13 : 1370 1st Qu.:18124 1st Qu.:18055 1st Qu.:18124
12/6/13 : 1370 Median :18321 Median :18310 Median :18313
12/2/13 : 1369 Mean :18366 Mean :18375 Mean :18352
12/5/13 : 1369 3rd Qu.:18595 3rd Qu.:18752 3rd Qu.:18697
12/23/13: 1353 Max. :19091 Max. :19091 Max. :19021
(Other) :12966
Pay attention at the "LastPrice" column. If I try to summarize directly LastPrice (variable that I actually need in my analysis) I've obtained this, that is pretty different from previous.
summary(FTSE$LastPrice)
Min. 1st Qu. Median Mean 3rd Qu. Max.
17750 18120 18320 18370 18600 19090
I'm pretty a newbie on R and I really can't figure why values are different. It's a rounding issue? I've read a lot of answers about this but I can't find a solution to uniform the results. I'm really stuck on this problem.
Thanks to anybody that could help me or even try to understand my problem. Regards
EDIT for shujaa:
max(FTSE$LastPrice)
[1] 19091.3
FTSE[which.max(FTSE$LastPrice), ]
Name DateLastTrade LastPrice Open LastClose
1 FTSE MIB 12/2/13 19091.3 19091.3 19021.48
It's a rounding problem. All the output from summary(FTSE$LastPrice)
has only 4 significant digits. If you look at ?summary
in its Usage
section you see the default for digits (as a named argument) coupled with the default for digits as an option gets your to 4.
# summary(object, ..., digits = max(3, getOption("digits")-3))
> getOption("digits")
[1] 7
So try:
summary(FTSE$LastPrice, digits=7)
An unanswered question remains, however: Why does the summary.data.frame function not do the same degree of rounding, since the default argument to digits is the same for the .default
and the .data.frame
methods? Looking at the code you see that summary.data.frame actually first does summary.default
on its columns with a fixed value of digits=12L, and later uses the digits argument to format
. It seemed to me that the help page was somewhat obscure in this area in it arguments description
digits: integer, used for number formatting with signif() (for summary.default) or
format() (for summary.data.frame).
It completely ignores the fact that the default (and fixed) signif for data.frame columns is quite different.