Does the gguf format perform model quantization even though it's already quantized with LORA?
Hello ! im new to Llms ,and l've fine-tuned the CODELLAMA model on kaggle using LORA.I've merged and pushed it to hugging face.I want to know if the model is already quantized with LORA why we need to requantized with gguf .
Model quantization and LoRA are different concepts.
As you know, LoRA is a kind of Parameter-Efficient Fine-Tuning technique which reduces the number of trainable parameters.
Whereas, model quantization reduces the size of a model by converting model weights from higher-precision representation(like FP32) to lower-precision representations(like bfloat16 or INT8).