I have a uniqueidentifier
in SQL database
CREATE TABLE [dbo].[Response]
(
...
[Id] uniqueidentifier NOT NULL
..
)
that maps to a System.Guid
public class Response
{
public Guid Id { get; set;}
}
I use SSAS (Sql Server Analysis Service) where the column is mapped as a
{
"name": "Id",
"dataType": "string",
"sourceColumn": "Id"
},
I am running into the memory limit on the pricing tier I am at and was exploring ideas to reduce the memory consumption of the row.
I could translate the GUIDs into a smaller Int32 while performing ETL into the warehouse / data lake from which SSAS is refreshed. However I use the Id to filter data via an API. doing this will need my API to be aware of the translation as well.
I wanted to know if there was a better data type which would reduce my memory footprint before making a HUGE refactor.
All is NOT Guid. if you know what I mean
Tabular model pros would instantly dismiss the idea of keeping random strings or unique ids in your model. You simply shouldn't do that because such columns can not be optimized.
Having said that, i'm aware that often tabular is being used as memory database. How is this guid column used ? Do you look for one record or range of records ? If you look for a single row a simple lookup table might help you. For example if there is less than 1000 responses per minute you can store date + record index in that particular minute. Usually date is stored anyway so you basically remove guid and replace it with int column that has less than 1k unique values and lookup table. Difference between 1 billion guids and 1 billion ints between 1-1000 (or even better 4 ints between 0-9) will be huge. The primary disadvantage of such solution is additional query that will translate guid to date + index, complexity and lack of flexibility.