Using Llama for your applications

I have written a python script that uses Llama and it is working well. However I have some doubts about the workings of Llama. In my case in one terminal I am running ollama run llava and also I can see that on the local host port 11434 Ollama is running as well. However when I stop running ollama run, the server in local host still runs.

I would like for someone to clarify:

What is the difference between ollama run <model> and ollama serve. My hunch is that ollama run actually pulls a model and runs a client to the server, is that correct?

In my application I simply sent a post request to the local host api generate with the name of the desired model so I suspect ollama run is something similar

If I have the local host port 11434 running already, do I need to run ollama serve? I do not remember running it before but somehow the server seems to be running
If I download new models can they be used like that without restarting anything?

Solution

When you installed ollama it installed and configured the ollama service already. That's separate from running "ollama run" to trigger a command line option, which does rely on the service.

If you're using systemd see if you don't have this file:

/etc/systemd/system/ollama.service

You'll also find that the ollama run command won't work unless the service is actually running. If you do:

# sudo systemctl stop ollama
# ollama run llama3
Error: could not connect to ollama app, is it running?

So the installer set it up as a service to run every time your system boots. The ollama command just makes queries against the running service over port 11434.

Note that you wouldn't want to use Python to trigger "ollama run" to interface, you'd bypass that step and talk directly to the API on port 11434.

And yes, any models you've already downloaded can be referenced in the API call without having to reload the service. You might also want to play around with the Open WebUI docker instance to see how some of that might work using your web browser.