machine-learning one-hot-encoding feature-engineering

One Hot Encoding with large dimensions

I am building a sales prediction model which consists of "Year", "Month", "Economy Indicator", "Customer_Id", "Product_Id", "Quantity", "Sales", "Margin".

The cleaned dataset contains about 1.5 millions of rows with the above 8 columns, which was the sales per month per customer per product for the past 6 years. My end goal is being able to predict the sales for the upcoming months for the entire up coming year, but more precisely, the prediction will be on product per customer level, which is a very detailed level.

However, Since my Customer_Id and Product_Id are TEXT, such as "A77BC", and there are over 100000 unique product_id and 6000 unique customer_id, if I use one hot encoding to label them, the dimentionality will be too high for my device to handle, (for example, my laptop has 16G ram but label the customer_id already requires 24G ram) And I believe there must be a better way of handling such situation, but I am very new to machine learning.

Solution

From a purely computer science perspective you might want to look into sparse matrices. While indeed encoding something as one-hot in a naive way will explode your memory (as it requires 4 bytes * num_rows * num_values to store) if you instead store it in a sparse format, you only need to remember index of the "1", and all the extra 0s are not stored, so if num_values is large, this will save you (num_values - 1)/num_values of the memory.