Search code examples
mysqlsqlpostgresqlvarchar

How does SQL determine a character's length in a varchar?


After reading the documentation, I understood that there is a one-byte or two-byte length prefix to a varying character so as to determine its length. I understand too that, for a varchar, each character might have a different length in bytes depending on the character itself.

So my question is:

How does the DBMS determine each character's length after it's stored?

Meaning: After a string is stored, let's say it's 4 characters long, and let's suppose that the first character is 1 byte long, the second 2 bytes, the 3rd 3 bytes and the 4th is 4.. How does the DB know how long is each character when retrieving the string so as to read it correctly?

I hope the question is clear, sorry for any English mistakes I made. Thanks


Solution

  • The way UTF-8 works as a variable-length encoding is that the 1-byte characters can only use 7 bits of that byte.

    If the high bit is 0, then the byte is a 1-byte character (which happens to be encoded in the same way as the 128 ASCII characters).

    If the high bit is 1, then it's a multi-byte character.

    enter image description here

    Picture from https://en.wikipedia.org/wiki/UTF-8