I'm working with NetCDF and FITS files and I have Tika working for extracting the header text in NetCDF files but I can only get basic file metadata for FITS files. Does header text extraction not work on FITS files?
Followed this for FITS: https://wiki.apache.org/tika/TikaGDAL And am only seeing the basic file metadata not the actual text from the header.
This is what I'm using for NetCDF files (also used tika --gui to see the header text): curl -X -PUT --data-binary @age4_timeseries.nc http://localhost:9998/tika --header "Content-type: text/-t" curl -T age4_timeseries.nc http://localhost:9998/tika --header "Accept: text/plain"
I've looked through the Tika Jira and found a reference from 2012: https://issues.apache.org/jira/browse/TIKA-874
But this does not appear to have been added to Tika.
I received this from Tika:
Content-Length: 40968000
Content-Type: application/fits
X-Parsed-By: org.apache.tika.parser.DefaultParser
X-Parsed-By: org.apache.tika.parser.gdal.GDALParser
X-TIKA:digest:MD5: cce03f62a68c09ec562f9e8e05b54b40
X-TIKA:digest:SHA256: b3f0c61409cbd7f2c9aeb8bdfa0798d529383db699c1055b8a12a68267b948dd
resourceName: mirc0000.fits
But was hoping to receive the content of the header like this:
SIMPLE = T / file does conform to FITS standard
BITPIX = 16 / number of bits per data pixel
NAXIS = 3 / number of data axes
NAXIS1 = 1280 / length of data axis 1
NAXIS2 = 16 / length of data axis 2
NAXIS3 = 1000 / length of data axis 3
EXTEND = T / FITS dataset may contain extensions
COMMENT FITS (Flexible Image Transport System) format is defined in 'AstronomyCOMMENT and Astrophysics', volume 376, page 359; bibcode: 2001A&A...376..359H
BZERO = 32768 / offset data range to that of unsigned short
BSCALE = 1 / default scaling factor
DATE = '2006-09-01T04:01:02' / File creation date (YYYY-MM-DDThh:mm:ss UTC)
TELESCOP= 'CHARA array 330m max baseline, 6dishes' / Telescope
INSTURME= 'MIRC spectro/combiner' / The data acquisition instrument
ORIGIN = 'Mount Wilson Institute' / Origin of the Observation
SITELAT = '34.13 ' / Latitude (Geodetic, VLBI, to be verified)
SITELONG= '118.03 ' / Longitude (Geodetic,
VLBI, to be verified)
SITEELEV= '1742.00 ' / Altitude above MSL, to be verified
HISTORY = 'Multi-Dish FITS data' / File modification history
OBJECT = 'HD_174639' / Target name
DATE-OBS= '09/01/2006' / UT date (YYYY-MM-DD)
UTC-OBS = '04:00:10' / Universal Time hh:mm:ss
LST-OBS = '18:48:41' / Local Sidereal Time hh:mm:ss
CHARA-TM= '04:00:11' / CHARA time hh:mm:ss
LOST-TKS= ' 0' / CHARA lost Ticks in RT Clock t
LOST-SEC= ' 0' / CHARA lost seconds in rt clock s
S1-TARGE= 41.342992001 / Delay line S1 target metrology
S2-TARGE= 38.610911409 / Delay line S2 target metrology
E1-TARGE= 0. / Delay line E1 target metrology
E2-TARGE= 44. / Delay line E2 target metrology
W1-TARGE= 0. / Delay line W1 target metrology
W2-TARGE= 0. / Delay line W2 target metrology
WAVELEN = 1.65 / Central wavelength
BANDWID = 0.3 / Bandwidth of spectrum
EXPOSURE= 5.483692 / Effective integration time in ms
ROWOFFS = 5 / Sub-image Y offset prom pixel 0
COLOFFS = 38 / Sub-image X offset prom pixel 0
NREADS = 8 / Number of multiple reads for pixel
FRMPRST = 1000 / Number of frames per reset
VOFFSET = 4. / PICNIC offset voltage
VD = 5. / PICNIC drain bias
ICTL = 3.3 / PICNIC warm OA offset voltage
END
Got it working! Key nugget to know, you have to have the CFITSIO library installed before building GDAL. CFITSIO library info: https://heasarc.gsfc.nasa.gov/docs/software/fitsio/fitsio.html
Download GDAL from here: http://download.osgeo.org/gdal/CURRENT/
gunzip
tar xvf
./configure --with-cfitsio
make
make install
Run Tika as usual. Now it works like a champ!