Search code examples
pythongoogle-translategoogle-translation-api

Bad request when trying to create glossary


I want to create an unidirectional glossary to use with a translation project of mine, using the example commands from Google's how-to guide: https://cloud.google.com/translate/docs/advanced/glossary#unidirectional_glossary

There was no example to create an unidirectional glossary by using python code, only for equivalent set glossaries, and i don't know what to change in the code.

I created a storage bucket and uploaded my glossary file.

Then I tried to execute this command in powershell:

$cred = gcloud auth application-default print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
  -Method POST `
  -Headers $headers `
  -ContentType: "application/json; charset=utf-8" `
  -InFile request.json `
  -Uri "https://translation.googleapis.com/v3/projects/[HIDDEN]/locations/us-east1/glossaries
" | Select-Object -Expand Content

This is the contents of the request.json file, based on their example:

{
  "name":"projects/[HIDDEN]/locations/us-east1/glossaries/kittglossary",
  "languagePair": {
    "sourceLanguageCode": "en",
    "targetLanguageCode": "hu"
    },
  "inputConfig": {
    "gcsSource": {
      "inputUri": "gs://kittgloss/glossary.csv"
    }
  }
}

And I get this error returned:

Invoke-WebRequest : The remote server returned an error: (400) Bad Request.
At line:4 char:1
+ Invoke-WebRequest `
+ ~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : InvalidOperation: (System.Net.HttpWebRequest:HttpWebRequest) [Invoke-WebRequest], WebExc
   eption
    + FullyQualifiedErrorId : WebCmdletWebResponseException,Microsoft.PowerShell.Commands.InvokeWebRequestCommand

I do have the GOOGLE_APPLICATION_CREDENTIALS environment variable, and implicit authentication has worked before when i tried a test translation


After trying the example python code to create glossary:

from google.cloud import translate_v3 as translate

pid = "[HIDDEN]",
iuri = "gs://kittgloss/glossary.csv",
gid = "kittglossary",

def create_glossary(
        project_id,
        input_uri,
        glossary_id,
        timeout,
):
    """
    Create a equivalent term sets glossary. Glossary can be words or
    short phrases (usually fewer than five words).
    https://cloud.google.com/translate/docs/advanced/glossary#format-glossary
    """
    client = translate.TranslationServiceClient()

    # Supported language codes: https://cloud.google.com/translate/docs/languages
    source_lang_code = "en"
    target_lang_code = "hu"
    location = "us-east1"  # The location of the glossary

    name = client.glossary_path(project_id, location, glossary_id)
    language_codes_set = translate.types.Glossary.LanguageCodesSet(
        language_codes=[source_lang_code, target_lang_code]
    )

    gcs_source = translate.types.GcsSource(input_uri=input_uri)

    input_config = translate.types.GlossaryInputConfig(gcs_source=gcs_source)

    glossary = translate.types.Glossary(
        name=name, language_codes_set=language_codes_set, input_config=input_config
    )

    parent = client.location_path(project_id, location)
    # glossary is a custom dictionary Translation API uses
    # to translate the domain-specific terminology.
    operation = client.create_glossary(parent=parent, glossary=glossary)

    result = operation.result(timeout)
    print("Created: {}".format(result.name))
    print("Input Uri: {}".format(result.input_config.gcs_source.input_uri))


create_glossary(pid,iuri,gid,timeout=180)

I get the following error returned, complaining of the file being a tuple instead of str:

Traceback (most recent call last):
  File "C:\Users\pc\AppData\Local\Programs\Python\Python38-32\lib\site-packages\google\protobuf\internal\python_message.py", line 702, in field_setter
    new_value = type_checker.CheckValue(new_value)
  File "C:\Users\pc\AppData\Local\Programs\Python\Python38-32\lib\site-packages\google\protobuf\internal\type_checkers.py", line 215, in CheckValue
    raise TypeError(message)
TypeError: ('gs://kittgloss/glossary.csv',) has type <class 'tuple'>, but expected one of: (<class 'bytes'>, <class 'str'>)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\pc\AppData\Local\Programs\Python\Python38-32\lib\site-packages\google\protobuf\internal\python_message.py", line 558, in init
    setattr(self, field_name, field_value)
  File "C:\Users\pc\AppData\Local\Programs\Python\Python38-32\lib\site-packages\google\protobuf\internal\python_message.py", line 704, in field_setter
    raise TypeError(
TypeError: Cannot set google.cloud.translation.v3.GcsSource.input_uri to ('gs://kittgloss/glossary.csv',): ('gs://kittgloss/glossary.csv',) has type <class 'tuple'>, but expected one of: (<class 'bytes'>, <class 'str'>)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\py\glossary.py", line 48, in <module>
    create_glossary(pid,iuri,gid,timeout=180)
  File "C:\py\glossary.py", line 30, in create_glossary
    gcs_source = translate.types.GcsSource(input_uri=input_uri)
  File "C:\Users\pc\AppData\Local\Programs\Python\Python38-32\lib\site-packages\proto\message.py", line 421, in __init__
    self.__dict__["_pb"] = self._meta.pb(**params)
  File "C:\Users\pc\AppData\Local\Programs\Python\Python38-32\lib\site-packages\google\protobuf\internal\python_message.py", line 560, in init
    _ReraiseTypeErrorWithFieldName(message_descriptor.name, field_name)
  File "C:\Users\pc\AppData\Local\Programs\Python\Python38-32\lib\site-packages\google\protobuf\internal\python_message.py", line 477, in _ReraiseTypeErrorWithFieldName
    six.reraise(type(exc), exc, sys.exc_info()[2])
  File "C:\Users\pc\AppData\Local\Programs\Python\Python38-32\lib\site-packages\six.py", line 702, in reraise
    raise value.with_traceback(tb)
  File "C:\Users\pc\AppData\Local\Programs\Python\Python38-32\lib\site-packages\google\protobuf\internal\python_message.py", line 558, in init
    setattr(self, field_name, field_value)
  File "C:\Users\pc\AppData\Local\Programs\Python\Python38-32\lib\site-packages\google\protobuf\internal\python_message.py", line 704, in field_setter
    raise TypeError(
TypeError: Cannot set google.cloud.translation.v3.GcsSource.input_uri to ('gs://kittgloss/glossary.csv',): ('gs://kittgloss/glossary.csv',) has type <class 'tuple'>, but expected one of: (<class 'bytes'>, <class 'str'>) for field GcsSource.input_uri

The glossary file is very simple and the first few lines look like so:

rear bumpers,hátsó lökhárító
front bumper spoiler,első lökhárító spoiler
front bumpers,első lökhárító

I'd appreciate any help.


Solution

  • I solved my own problem by converting my own glossary to a simple EN to HU equivalent term glossary by adding a header to my csv file like this:

    First few lines

    en,hu,pos
    rear bumpers,hátsó lökhárító,noun
    front bumper spoiler,első lökhárító spoiler,noun
    front bumpers,első lökhárító,noun
    

    Then I used the example python code somewhat modified to create the glossary. One issue i had that glossaries apparently can only be created in us-central1 and global. I know my code doesn't look pretty by simply using strings but it worked:

    from google.cloud import translate_v3beta1 as translate
    def create_glossary():
    
        client = translate.TranslationServiceClient()
        ## Set your project name
        project_id = 'flawless-acre-284812'
        ## Set your wished glossary-id
        glossary_id = 'kittglossaryv2'
        ## Set your location
        location = 'us-central1'  # The location of the glossary
    
        name = client.glossary_path(
            project_id,
            location,
            glossary_id)
    
        language_codes_set = translate.types.Glossary.LanguageCodesSet(
            language_codes=['en', 'hu'])
        ## SET YOUR BUCKET URI
        gcs_source = translate.types.GcsSource(
            input_uri='gs://kittgloss/etglossaryv2.csv')
    
        input_config = translate.types.GlossaryInputConfig(
            gcs_source=gcs_source)
    
        glossary = translate.types.Glossary(
            name=name,
            language_codes_set=language_codes_set,
            input_config=input_config)
    
        parent = 'projects/flawless-acre-284812/locations/us-central1'
    
        operation = client.create_glossary(parent=parent, glossary=glossary)
    
        result = operation.result(timeout=90)
        print('Created: {}'.format(result.name))
        print('Input Uri: {}'.format(result.input_config.gcs_source.input_uri))
        
    create_glossary()
    

    Hope this was helpful to someone