Search code examples
exportgoogle-cloud-platformbigtablegoogle-cloud-bigtable

Error exporting data from Google Cloud Bigtable


While going through the Google docs, I'm getting the below stack trace on the final export command (executed from the master instance with appropriate env variables set).

${HADOOP_HOME}/bin/hadoop jar ${HADOOP_BIGTABLE_JAR} export-table -libjars ${HADOOP_BIGTABLE_JAR} <table-name> <gs://bucket>

SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/hadoop/hbase-install/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/hadoop/hadoop-install/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
2016-02-08 23:39:39,068 INFO  [main] mapreduce.Export: versions=1, starttime=0, endtime=9223372036854775807, keepDeletedCells=false
2016-02-08 23:39:39,213 INFO  [main] gcs.GoogleHadoopFileSystemBase: GHFS version: 1.4.4-hadoop2
java.lang.IllegalAccessError: tried to access field sun.security.ssl.Handshaker.localSupportedSignAlgs from class sun.security.ssl.ClientHandshaker
    at sun.security.ssl.ClientHandshaker.processMessage(ClientHandshaker.java:278)
    at sun.security.ssl.Handshaker.processLoop(Handshaker.java:913)
    at sun.security.ssl.Handshaker.process_record(Handshaker.java:849)
    at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:1035)
    at sun.security.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1344)
    at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1371)
    at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1355)
    at sun.net.www.protocol.https.HttpsClient.afterConnect(HttpsClient.java:559)
    at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.connect(AbstractDelegateHttpsURLConnection.java:185)
    at sun.net.www.protocol.https.HttpsURLConnectionImpl.connect(HttpsURLConnectionImpl.java:153)
    at com.google.api.client.http.javanet.NetHttpRequest.execute(NetHttpRequest.java:93)
    at com.google.api.client.http.HttpRequest.execute(HttpRequest.java:972)
    at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:419)
    at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:352)
    at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:469)
    at com.google.cloud.hadoop.gcsio.GoogleCloudStorageImpl.getBucket(GoogleCloudStorageImpl.java:1599)
    at com.google.cloud.hadoop.gcsio.GoogleCloudStorageImpl.getItemInfo(GoogleCloudStorageImpl.java:1554)
    at com.google.cloud.hadoop.gcsio.CacheSupplementedGoogleCloudStorage.getItemInfo(CacheSupplementedGoogleCloudStorage.java:547)
    at com.google.cloud.hadoop.gcsio.GoogleCloudStorageFileSystem.getFileInfo(GoogleCloudStorageFileSystem.java:1042)
    at com.google.cloud.hadoop.gcsio.GoogleCloudStorageFileSystem.exists(GoogleCloudStorageFileSystem.java:383)
    at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.configureBuckets(GoogleHadoopFileSystemBase.java:1650)
    at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem.configureBuckets(GoogleHadoopFileSystem.java:71)
    at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.configure(GoogleHadoopFileSystemBase.java:1598)
    at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.initialize(GoogleHadoopFileSystemBase.java:783)
    at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.initialize(GoogleHadoopFileSystemBase.java:746)
    at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591)
    at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:89)
    at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2625)
    at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2607)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:368)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:167)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:352)
    at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296)
    at org.apache.hadoop.hbase.util.DynamicClassLoader.<init>(DynamicClassLoader.java:104)
    at org.apache.hadoop.hbase.protobuf.ProtobufUtil.<clinit>(ProtobufUtil.java:241)
    at org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.convertScanToString(TableMapReduceUtil.java:509)
    at org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.initTableMapperJob(TableMapReduceUtil.java:207)
    at org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.initTableMapperJob(TableMapReduceUtil.java:168)
    at org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.initTableMapperJob(TableMapReduceUtil.java:291)
    at org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.initTableMapperJob(TableMapReduceUtil.java:92)
    at org.apache.hadoop.hbase.mapreduce.IdentityTableMapper.initJob(IdentityTableMapper.java:51)
    at org.apache.hadoop.hbase.mapreduce.Export.createSubmittableJob(Export.java:75)
    at org.apache.hadoop.hbase.mapreduce.Export.main(Export.java:187)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72)
    at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:145)
    at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:153)
    at com.google.cloud.bigtable.mapreduce.Driver.main(Driver.java:35)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:212)

Here's my ENV var set up in case it's helpful:

export HBASE_HOME=/home/hadoop/hbase-install
export HADOOP_CLASSPATH=`${HBASE_HOME}/bin/hbase classpath`
export HADOOP_HOME=/home/hadoop/hadoop-install

export HADOOP_CLIENT_OPTS="-Xbootclasspath/p:${HBASE_HOME}/lib/bigtable/alpn-boot-7.1.3.v20150130.jar"
export HADOOP_BIGTABLE_JAR=${HBASE_HOME}/lib/bigtable/bigtable-hbase-mapreduce-0.2.2-shaded.jar
export HADOOP_HBASE_JAR=${HBASE_HOME}/lib/hbase-server-1.1.2.jar

Also, when I try to run hbase shell and then list tables it just hangs and doesn't fetch me the list of tables. This is what happens:

~$ hbase shell
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/hadoop/hbase-install/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/hadoop/hadoop-install/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
2016-02-09 00:02:01,334 INFO  [main] grpc.BigtableSession: Opening connection for projectId mystical-height-89421, zoneId us-central1-b, clusterId twitter-data, on data host bigtable.googleapis.com, table admin host bigtabletableadmin.googleapis.com.
2016-02-09 00:02:01,358 INFO  [BigtableSession-startup-0] grpc.BigtableSession: gRPC is using the JDK provider (alpn-boot jar)
2016-02-09 00:02:01,648 INFO  [bigtable-connection-shared-executor-pool1-t2] io.RefreshingOAuth2CredentialsInterceptor: Refreshing the OAuth token
HBase Shell; enter 'help<RETURN>' for list of supported commands.
Type "exit<RETURN>" to leave the HBase Shell
Version 1.1.2, rcc2b70cf03e3378800661ec5cab11eb43fafe0fc, Wed Aug 26 20:11:27 PDT 2015

hbase(main):001:0> list
TABLE

I've tried:

  • Double checking ALPN and ENV variables are appropriately set
  • Double checking hbase-site.xml and hbase-env.sh to make sure nothing looks wrong.

I also even tried connecting to my cluster (like I was previously able to following these directions) from ANOTHER gcloud instance, but it seems like I can't seem to get that to work now either...(it also hangs)

user@gcloud-instance:hbase-1.1.2$ bin/hbase shell
2016-02-09 00:07:03,506 WARN  [main] util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2016-02-09 00:07:03,913 INFO  [main] grpc.BigtableSession: Opening connection for projectId <project>, zoneId us-central1-b, clusterId <cluster>, on data host bigtable.googleapis.com, table admin host bigtabletableadmin.googleapis.com.
2016-02-09 00:07:04,039 INFO  [BigtableSession-startup-0] grpc.BigtableSession: gRPC is using the JDK provider (alpn-boot jar)
2016-02-09 00:07:05,138 INFO  [Credentials-Refresh-0] io.RefreshingOAuth2CredentialsInterceptor: Refreshing the OAuth token
HBase Shell; enter 'help<RETURN>' for list of supported commands.
Type "exit<RETURN>" to leave the HBase Shell
Version 1.1.2, rcc2b70cf03e3378800661ec5cab11eb43fafe0fc, Wed Aug 26 20:11:27 PDT 2015

hbase(main):001:0> list
TABLE
Feb 09, 2016 12:07:08 AM com.google.bigtable.repackaged.io.grpc.internal.TransportSet$1 run
INFO: Created transport com.google.bigtable.repackaged.io.grpc.netty.NettyClientTransport@7b480442(bigtabletableadmin.googleapis.com/64.233.183.219:443) for bigtabletableadmin.googleapis.com/64.233.183.219:443

Any ideas with what I'm doing wrong? Looks like an access issue - how do I fix it?

Thanks!


Solution

    1. You can spin up a Dataproc cluster w/ Bigtable enabled following these instructions.

    2. ssh to the master by ./cluster.sh ssh

    3. hbase shell to verify that all is in order.

    4. hadoop jar ${HADOOP_BIGTABLE_JAR} export-table -libjars ${HADOOP_BIGTABLE_JAR} <table-name> gs://<bucket>/some-folder

    5. gsutil ls gs://<bucket>/some-folder/** and see if _SUCCESS exists. If so, the remaining files are your data.

    6. exit from your cluster master

    7. ./cluster.sh delete to get rid of the cluster, if you no longer require it.

    You ran into a problem with the weekly java runtime update, that has been corrected.