Search code examples
python-3.xstringutf-8google-cloud-functionsgoogle-cloud-datastore

UTF-8 strings in the Google Cloud Datastore using Python3x?


I am using Pub/sub, Cloud functions and datastore altogether. User sends the data in JSON Format through Pub/Sub topic and then this json payload is being received by the Cloud functions. There is a bit of processing on some of the data and then data is being stored in Datastore.

Now problem is that sometimes some other characters also received in the JSON payload string by Cloud function e.g.

{'Data': 'ßTest'} #Already converted into UTF-8 by the user

so,when i do ..

data = pubsub_message['Data']
print(data) # OUTPUT :=> 'ßTest'
print(type(data)) # OUTPUT :=> #'str'
data.decode('utf-8')

decode gives an exception that str doesnt have decode, which makes sense because its type is 'str'.

Now what i am doing is i am encoding it as utf-8.

d=data.encode('utf-8')

Which gives me d back as type 'BYTES'. and then i store it in Datastore. Now when i check in datastore it is a wiered string and of type Blob.

Now my question is.Can i store it as it is in the Datastore without encoding it in 'utf-8'? or with encoding 'utf-8' in BLOB format in DATASTORE is ok?


Solution

  • As Best practices says:

    Always use UTF-8 characters for properties of type string. A non-UTF-8 character in a property of type string could interfere with queries. If you need to save data with non-UTF-8 characters, use a byte string.

    Means that you have to store your data in UTF-8 or as a byte string.

    For Blob you store data as bytes as well.