I have decided to read/copy files straight from their online repository to avoid download the files at first. Given this is my first attempt at this, this's been my first interaction with aws.s3
.
First, just to make sure I could run something simple, I checked if the bucket existed. I did so with bucket_exists
defining both the bucket
and the region
. The bucket does exist.
However, the file I want to inspect is an .h5
file. To work with it, I got the rhdf5
library from BiocManager
. Then, to inspect the one file, I did the following:
s3read_using(
FUN = rhdf5::H5Fopen,
bucket = "s3://arpa-e-perform/ERCOT/",
region = "us-west-2",
object = "s3://arpa-e-perform/ERCOT/2018/Solar/Actuals/BA_level/BA_solar_actuals_2018.h5")
Unfortunately, it didn't work. The message and the error message I got follow:
List of 6
$ Code : chr "PermanentRedirect"
$ Message : chr "The bucket you are attempting to access must be addressed using the specified endpoint. Please send all future "| truncated
$ Endpoint : chr "arpa-e-perform.s3.amazonaws.com"
$ Bucket : chr "arpa-e-perform"
$ RequestId: chr "BGEZ97HJH10KAPRE"
$ HostId : chr "pxKXcYNLchSYTwEaPLDoFRo11qkWontw+kWAtb8ZqTTEYwTptAkSgl8dbJoI8a2URXIxDCOE7/g="
- attr(*, "headers")=List of 7
..$ x-amz-bucket-region: chr "us-west-2"
..$ x-amz-request-id : chr "BGEZ97HJH10KAPRE"
..$ x-amz-id-2 : chr "pxKXcYNLchSYTwEaPLDoFRo11qkWontw+kWAtb8ZqTTEYwTptAkSgl8dbJoI8a2URXIxDCOE7/g="
..$ content-type : chr "application/xml"
..$ transfer-encoding : chr "chunked"
..$ date : chr "Mon, 06 Jun 2022 17:35:35 GMT"
..$ server : chr "AmazonS3"
..- attr(*, "class")= chr [1:2] "insensitive" "list"
- attr(*, "class")= chr "aws_error"
NULL
Error in parse_aws_s3_response(r, Sig, verbose = verbose) :
Moved Permanently (HTTP 301).
Today's been my first interaction with aws.s3
and I'm still going through the manual/forums, so all help will be appreciated. Thank you.
I think the problem here is that you're not access the file at the correct location. The error message says "The bucket you are attempting to access must be addressed using the specified endpoint" and then provides the 'endpoint' as "arpa-e-perform.s3.amazonaws.com", which looks much more like a regular http URL.
Here's an example of reading the meta
dataset from the file using rhdf5
.
library(rhdf5)
## Create file access property list for reading from S3
## Credentials are NULL as this is a public bucket
fapl <- H5Pcreate("H5P_FILE_ACCESS")
H5Pset_fapl_ros3(fapl, s3credentials = NULL)
## Open file and the meta dataset
fid <- H5Fopen(name = "https://arpa-e-perform.s3.amazonaws.com/ERCOT/2018/Solar/Actuals/BA_level/BA_solar_actuals_2018.h5", flags = "H5F_ACC_RDONLY", fapl = fapl)
did <- H5Dopen(fid, name = "/meta")
## read the dataset
meta <- H5Dread(did)
## tidy up
H5Dclose(did)
H5Pclose(fapl)
H5Fclose(fid)
## Here's the output
head(meta)
#> site_ids AC_capacity_MW module_type dc_ac_ratio azimuth latitude
#> 1 BA BA BA BA BA BA
#> 2 Adamstown Solar 250 0 1.25 180 33.25
#> 3 Agate Solar 60 0 1.3 180 32.45
#> 4 Angelina Solar 150 0 1.4 180 31.37
#> 5 Angelo Solar 195 2 1.25 180 31.41
#> 6 Angus Solar 113 0 1.25 180 31.69
#> longitude elevation timezone country state county urban
#> 1 BA BA BA BA BA BA BA
#> 2 -97.26 220.16 -6 bUnited States bTexas bDenton bNone
#> 3 -97.18 217.84 -6 bUnited States bTexas bJohnson bNone
#> 4 -94.86 85 -6 bUnited States bTexas bAngelina bNone
#> 5 -100.58 623.72 -6 bUnited States bTexas bTom Green bNone
#> 6 -97.26 140.72 -6 bUnited States bTexas bMcLennan bNone
#> population landcover gid reV_tech proposed Zone ISO
#> 1 BA BA BA BA BA BA BA
#> 2 438 140 690482 bpv Proposed NORTH ERCOT
#> 3 2105 140 692563 bpv Proposed NORTH CENTRAL ERCOT
#> 4 183 50 744853 bpv Proposed EAST ERCOT
#> 5 32 30 600558 bpv Proposed WEST ERCOT
#> 6 715 140 690817 bpv Proposed NORTH CENTRAL ERCOT