amazon-web-services large-language-model fine-tuning

vllm-0.4.0.post1+neuron213; ModuleNotFoundError: No module named 'vllm._C'

My issue is with vLLM running with Neuron on AWS inferentia instance.

I successfully installed vllm-0.4.0.post1+neuron213. But when I run LLM using vLLM it says ModuleNotFoundError: No module named 'vllm._C'.

I realized there is the following functuion in vllm setup.py,

if not _is_neuron():
    ext_modules.append(CMakeExtension(name="vllm._C"))

and

cmdclass={"build_ext": cmake_build_ext} if not _is_neuron() else {},

So, vllm._C won't be created if the device is Neuron (AWS inferentia Instance - inf2). This results in ModuleNotFoundError: No module named 'vllm._C'.

How to fix it?

I tried this example

from vllm import LLM, SamplingParams

prompts = [
    "Hello, my name is",
    "The president of the United States is",
    "The capital of France is",
    "The future of AI is",
]
sampling_params = SamplingParams(temperature=0.8, top_p=0.95)

llm = LLM(
    model="TinyLlama/TinyLlama-1.1B-Chat-v1.0",
    max_num_seqs=8,
    max_model_len=128,
    block_size=128,
    device="neuron",
    tensor_parallel_size=2)

outputs = llm.generate(prompts, sampling_params)

for output in outputs:
    prompt = output.prompt
    generated_text = output.outputs[0].text
    print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")

Solution

I just solved it this way.

The problem is with from vllm._C import ops while there is no vllm._C. We need ops that exists in your_environment_name/lib/python3.10/site-packages/vllm/model_executor/layers/ (see the figure below)

So, what we have to do is to change from vllm._C import ops to from vllm.model_executor.layers import ops in every single file of the package. This solves the problem :)