My issue is with vLLM running with Neuron on AWS inferentia instance.
I successfully installed vllm-0.4.0.post1+neuron213
.
But when I run LLM using vLLM it says ModuleNotFoundError: No module named 'vllm._C'
.
I realized there is the following functuion in vllm setup.py
,
if not _is_neuron():
ext_modules.append(CMakeExtension(name="vllm._C"))
and
cmdclass={"build_ext": cmake_build_ext} if not _is_neuron() else {},
So, vllm._C
won't be created if the device is Neuron (AWS inferentia Instance - inf2). This results in ModuleNotFoundError: No module named 'vllm._C'
.
How to fix it?
I tried this example
from vllm import LLM, SamplingParams
prompts = [
"Hello, my name is",
"The president of the United States is",
"The capital of France is",
"The future of AI is",
]
sampling_params = SamplingParams(temperature=0.8, top_p=0.95)
llm = LLM(
model="TinyLlama/TinyLlama-1.1B-Chat-v1.0",
max_num_seqs=8,
max_model_len=128,
block_size=128,
device="neuron",
tensor_parallel_size=2)
outputs = llm.generate(prompts, sampling_params)
for output in outputs:
prompt = output.prompt
generated_text = output.outputs[0].text
print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")
I just solved it this way.
The problem is with from vllm._C import ops
while there is no vllm._C
.
We need ops
that exists in your_environment_name/lib/python3.10/site-packages/vllm/model_executor/layers/
(see the figure below)
So, what we have to do is to change from vllm._C import ops
to from vllm.model_executor.layers import ops
in every single file of the package.
This solves the problem :)