Search code examples
tensorflow

WMT14 loading dataset code worked last week, but now it does not


I'm just trying to download the wmt14 dataset to replicate the results from the Attention is All You Need paper. This code worked last week, but now when I try to run it I get this warning until it gets an Error. Any ideas why?

Warning: WARNING:urllib3.connectionpool:Retrying (Retry(total=9, connect=None, read=None, redirect=None, status=None)) after connection broken by 'SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate has expired (_ssl.c:1007)'))': /wmt13/training-parallel-europarl-v7.tgz

Error: SSLCertVerificationError Traceback (most recent call last) /usr/local/lib/python3.10/dist-packages/urllib3/connectionpool.py in _make_request(self, conn, method, url, body, headers, retries, timeout, chunked, response_conn, preload_content, decode_content, enforce_content_length) 467 try: --> 468 self._validate_conn(conn) 469 except (SocketTimeout, BaseSSLError) as e:

54 frames SSLCertVerificationError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate has expired (_ssl.c:1007)

During handling of the above exception, another exception occurred:

SSLError Traceback (most recent call last) SSLError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate has expired (_ssl.c:1007)

The above exception was the direct cause of the following exception:

MaxRetryError Traceback (most recent call last) MaxRetryError: HTTPSConnectionPool(host='www.statmt.org', port=443): Max retries exceeded with url: /wmt13/training-parallel-europarl-v7.tgz (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate has expired (_ssl.c:1007)')))

During handling of the above exception, another exception occurred:

SSLError Traceback (most recent call last) /usr/local/lib/python3.10/dist-packages/requests/adapters.py in send(self, request, stream, timeout, verify, cert, proxies) 515 if isinstance(e.reason, _SSLError): 516 # This branch is for urllib3 v1.22 and later. --> 517 raise SSLError(e, request=request) 518 519 raise ConnectionError(e, request=request)

SSLError: HTTPSConnectionPool(host='www.statmt.org', port=443): Max retries exceeded with url: /wmt13/training-parallel-europarl-v7.tgz (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate has expired (_ssl.c:1007)')))

Code: # Load the French-English dataset examples, metadata = tfds.load('wmt14_translate/fr-en', with_info=True, as_supervised=True) train_examples, val_examples = examples['train'], examples['validation']

# Define the tokenizer for English using the subword tokenizers from TFDS
tokenizer_en = tfds.deprecated.text.SubwordTextEncoder.build_from_corpus(
    (en.numpy() for en, fr in train_examples), target_vocab_size=2**13)

# Define the tokenizer for German using the subword tokenizers from TFDS
tokenizer_fr = tfds.deprecated.text.SubwordTextEncoder.build_from_corpus(
    (fr.numpy() for en, fr in train_examples), target_vocab_size=2**13)

I tried restarting runtime, new notebooks, and some other stuff and nothing I've tried worked so far. Seems like part of my problem is its looking for wmt13 instead of wmt14, but I don't think I caused that


Solution

  • Yeah it happened to me as well. Statml website ssl expired. I reloaded my colab midnight and cannot resume experiment...