I have a qualifier(long value type) in row of H-Base table.
I want to fetch H-Base rows in between of two long numbers. For that I am using following filters.
My filters are like :
long startEpochInDay = 384;
long endEpochInDays = 396;
string startDayFilter = "SingleColumnValueFilter('" + cf + "','" + qualifier + "', >= ,'binary:" + Encoding.UTF8.GetString(HBaseGenericHelper.GetBigEndianByteArray(startEpochInDays)) + "',true,true)";
string endDayFilter = "SingleColumnValueFilter('" + cf + "','" + qualifier + "', < ,'binary:" + Encoding.UTF8.GetString(HBaseGenericHelper.GetBigEndianByteArray(endEpochInDays)) + "',true,true)";
string finalFilter = startDayFilter + " AND " + endDayFilter
These filters are working fine with number less than 383, but fails if number is greater than this number.
I found while debugging while converting long number to Byte array it returns byte array like \0\0\0\0\0\0\1\128.
When last number in byte array is 127 or less, UTF-8 works fine but as this number becomes 128 or greater than that, UTF-8 started returning "?" for last digit.
If I use following method to encoding byte array to string
Encoding encoding = new UTF8Encoding(true,true);
string number = encoding.GetString(HBaseGenericHelper.GetBigEndianByteArray(startEpochInDays));
UTF-8 is throwing exception while converting byte array(if last digit is 128 or more in byte array) to string in filter.
Exception - Unable to translate bytes [8B] at index 6 from specified code page to Unicode.
Inner Exception -
at System.Text.DecoderExceptionFallbackBuffer.Throw(Byte[] bytesUnknown, Int32 index)
at System.Text.DecoderExceptionFallbackBuffer.Fallback(Byte[] bytesUnknown, Int32 index)
at System.Text.DecoderFallbackBuffer.InternalFallback(Byte[] bytes, Byte* pBytes)
at System.Text.UTF8Encoding.GetCharCount(Byte* bytes, Int32 count, DecoderNLS baseDecoder)
at System.String.CreateStringFromEncoding(Byte* bytes, Int32 byteLength, Encoding encoding)
at System.Text.UTF8Encoding.GetString(Byte[] bytes, Int32 index, Int32 count)
at System.Text.Encoding.GetString(Byte[] bytes)
Thanks in Advance.
UTF8 is not an appropriate way of encoding arbitrary bytes as a string. Rather: it encodes arbitrary strings as bytes (and vice-versa, as long as the bytes are in the correct format). There is no reason to think that HBaseGenericHelper.GetBigEndianByteArray(startEpochInDays)
returns UTF-8 data, so encoding.GetString
is entirely inappropriate and is actually using the Encoding
backwards. This is the first topic I discussed here - so don't panic: you're in good company - people make this mistake all the time.
What you should be using is something like base-16 (hexadecimal) or base-64.
To get hex: BitConverter.ToString(byte[])
. To get base-64: Convert.ToBase64String(byte[])
If you need the data to be in a particular format that isn't base-64 or base-16, then you'll have to be specific about what format you want. But: it isn't "UTF-8 used backwards".