I've downloaded a dataset ("OR_GLM-L2-LCFA_G16_s20202530000000_e20202530000200_c20202530000226.nc") from the noaa-16 bucket hosted on AWS (Amazon Web Services). The dataset contains a variable titled "flash_time_offset_of_first_event", detailing the number of seconds since 9/9/2020, between 00:00:00 and 00:00:20...
I am reading the netCDF file using:
using NCDatasets
flash_times = []
fname1 = "OR_GLM-L2-LCFA_G16_s20202530000000_e20202530000200_c20202530000226.nc"
fpath1 = string("C:\\mydir\\",fname1)
NCDataset(fpath1) do ds
[push!(flash_times,string(x)) for x in ds["flash_time_offset_of_first_event"][:,:]]
end
which produces:
sort(flash_times)
481-element Array{Any,1}:
"2020-09-08T23:59:42.594"
"2020-09-08T23:59:42.672"
"2020-09-08T23:59:42.688"
⋮
"2020-09-09T00:00:07.324"
"2020-09-09T00:00:07.366"
"2020-09-09T00:00:07.42"
The problem is that the times do not match the times shown in the plot of the values, as plotted in Panoply (shown below). In Panoply, the plotted values range from ~-0.7s (2020-09-08T23:59:59.3) to ~19.4s (2020-09-09T00:00:19.4):
I'm extracting the earliest and latest times in my array of DateTimes using:
@info("",Earliest_time=sort(flash_times)[1],Latest_time=sort(flash_times)[end])
which produces:
┌ Info:
│ Earliest_time = "2020-09-08T23:59:42.594"
└ Latest_time = "2020-09-09T00:00:07.42"
My question: How can I correctly extract these times, or correct the times I have extracted? The NetCDF also contains information for scale_factor
and add_offset
variables, but I have not been able to implement these so far. I've also tried extracting the netCDF data using the NetCDF package, but this returns an array of integers, which I've tried (unsuccessfully) to convert using the scale_factor
and add_offset
variables.
NCDatasets automatically applies the scale_factor
and add_offset
attributes already, but there is another attribute here, _Unsigned
, which NCDatasets doesn't know about yet, while other libraries like used in Panoply and xarray do. I created this issue for it: https://github.com/Alexander-Barth/NCDatasets.jl/issues/133.
So in short the data is stored as an Int16
, and has negative values in the second half. Since these negative values are supposed to be interpreted as unsigned (positive) values, this affects the dates that come out in the end. We can dig down and apply the steps ourselves to get the correct value from the raw data for now:
using NCDatasets
using Downloads
using Dates
url = "http://ftp.cptec.inpe.br/goes/goes16/glm/2020/09/09/OR_GLM-L2-LCFA_G16_s20202530000000_e20202530000200_c20202530000226.nc"
path = Downloads.download(url)
ds = NCDataset(path)
var = ds["flash_time_offset_of_first_event"]
# flash_time_offset_of_first_event (481)
# Datatype: Int16
# Dimensions: number_of_flashes
# Attributes:
# long_name = GLM L2+ Lightning Detection: time of occurrence of first constituent event in flash
# standard_name = time
# _Unsigned = true
# scale_factor = 0.0003814756
# add_offset = -5.0
# units = seconds since 2020-09-09 00:00:00.000
# axis = T
# see the issue, these dates are incorrect
extrema(var)
# DateTime("2020-09-08T23:59:42.594")
# DateTime("2020-09-09T00:00:07.420")
raw = var.var[:] # Vector{Int16}
# interpret as unsigned, and apply scale and offset
val = reinterpret(unsigned(eltype(raw)), raw) .* NCDatasets.scale_factor(var) .+ NCDatasets.add_offset(var)
# get dates by adding these as seconds to the epoch
epoch = DateTime("2020-09-09T00:00:00")
times = epoch .+ Millisecond.(round.(Int, val .* 1000))
extrema(times) # these are as expected
# DateTime("2020-09-08T23:59:59.251")
# DateTime("2020-09-09T00:00:19.408")
Of course this is not ideal, and it would be easier for users to address this in NCDatasets itself.