Could you tell me please if there is a suitable quantizing method in the following case (preferrably implemented in python)?
There is an input range where majority of values are within +-2 std from mean, while some huge outliers are present. E.g. [1, 2, 3, 4, 5, 1000] Quantizing it to output range of e.g. 0-255 would result in loss of precision because of huge outlier 1000 (1, 2, 3, 4, 5 will all become 0).
However, it is important to keep precision for those values which are within several std from mean.
Throwing away the outliers or replacing them with NaN is not acceptable. They should be kept in some form. Roughly, using example above, output of quantization should be something like [1, 2, 3, 4, 5, 255]
Thank you very much for any input.
I can think of 2 answers to your question.
However, regardless of choices 1 or 2, it is probably best to anyways compare the outcomes with and without this outlier. You really want to avoid your conclusions being driven by this single observation.