Running Your Own ChatGPT with Ollama

  1. Setup Ollama
  2. Gradio
    1. Listing Models
    2. Using Chat History
  3. Visual Studio Code with Extension


Ever wanted to explore the capabilities of AI-powered chatbots like ChatGPT and Copilot without relying on third-party services? With Ollama and some setup, you can run your own instance of ChatGPT and customize it to fit your specific needs. Whether you’re a developer looking to integrate conversational AI into your app or project, a researcher seeking to analyze the performance of AI models, or simply an enthusiast wanting to experiment with this cutting-edge technology - having control over your own chatbot can open up exciting possibilities for innovation and discovery.

Whatever, I want to run my own ChatGPT and Copilot.

Setup Ollama

First, you have to setup Ollama. So far I have heard any complain of difficult to setup ollama on any platform, including Windows WSL. After ollama is ready, pull a model like Llama 3:

$ # pulling a 5GB model could takes sometime...
$ ollama pull llama3:instruct 

$ # list out model you have
$ ollama list 
NAME                    ID              SIZE    MODIFIED
llama3:instruct         71a106a91016    4.7 GB  11 days ago

Test the model through command line, though wording can be different:

$ ollama run llama3:instruct
>>> Send a message (/? for help)
>>> are you ready?
I'm always ready to play a game, answer questions, or just chat. What's on your mind? Let me know what you'd like
to do and I'll do my best to help.

Check if ollama’s port is listening:

$ netstat -a -n | grep 11434
tcp        0      0 127.0.0.1:11434         0.0.0.0:*               LISTEN
$ # looks good, port 11434 is ready

$ # run below comman if ollama is not listening
ollama serve

Gradio

Although you can setup system prompt, parameters, embedding content, and some pre/post processing, you want an easier way to interact through a web interface. You need to Gradio. Assuming you have python ready:

$ # create python virtual environment.
$ # gradio-env is a folder name and you can change it to anything
$ python -m venv gradio-env

$ # jump into the virtual envirnoment
$ gradio-env/Script/activate
$ # if you are using Windows
$ gradio-env\script\activate.bat

$ # install gradio
$ pip install gradio

Next, create a python script with filename like app.py. Update model if you are using other:

app.py
import requests import json import gradio as gr model = "llama3:instruct" url = "http://localhost:11434/api/" def generate_response(prompt, history): data = {"model": model, "stream": False, "prompt": prompt} response = requests.post( url + "generate", headers={"Content-Type": "application/json", "Connection": "close"}, data=json.dumps(data), ) if response.status_code == 200: return json.loads(response.text)["response"] else: print("Error: generate response:", response.status_code, response.text) demo = gr.ChatInterface( fn=generate_response ) if __name__ == "__main__": demo.launch()

Run below command and you can have the interface with the URL:

$ gradio app.py
Watching:
...

Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.

Open http://127.0.0.1:7860

Test

Listing Models

Assume you have pull more than one model, and you want to pick from a drop down list. First, you get the list from the API:

app.py
def list_models(): response = requests.get( url + "tags", headers={"Content-Type": "application/json", "Connection": "close"}, ) if response.status_code == 200: models = json.loads(response.text)["models"] return [d['model'] for d in models] else: print("Error:", response.status_code, response.text) models = list_models()

Then, you can add a dropbox into the ChatInterface:

app.py
with gr.Blocks() as demo: dropdown = gr.Dropdown(label='Model', choices=models) # select the first item as default model dropdown.value = models[0] gr.ChatInterface(fn=generate_response, additional_inputs=dropdown)

Now the generate_response function takes an extra parameter from additional_inputs dropdown. Update the function:

app.py
def generate_response(prompt, history, model): data = {"model": model, "stream": False, "prompt": prompt} response = requests.post( url + "generate", headers={"Content-Type": "application/json", "Connection": "close"}, data=json.dumps(data), ) if response.status_code == 200: return json.loads(response.text)["response"] else: print("Error: generate response:", response.status_code, response.text)

You can see the dropdown after expanding additioal inputs:

Drop down

Using Chat History

Previous implementation utilized the ollama’s generate API. The API accepted a simple parameter, prompt, which allowed sending chat history together prompt. However, it is better to use the chat API from ollama, which accepts a messages parameter. This messages parameter can specify a role, enabling the model to better interpret and respond accordingly. Below demostrate how to put history into the API format, and the last message is the latest user input.

app.py
def generate_response(prompt, history, model): messages = [] for u, a in history: messages.append({"role": "user", "content": u}) messages.append({"role": "assistant", "content": a}) messages.append({"role": "user", "content": prompt}) data = {"model": model, "stream": False, "messages": messages} response = requests.post( url + "chat", headers={"Content-Type": "application/json", "Connection": "close"}, data=json.dumps(data), ) if response.status_code == 200: bot_message = json.loads(response.text)["message"]["content"] return bot_message else: print("Error: generate response:", response.status_code, response.text)

Full source code: https://github.com/neoalienson/llama-in-chains

Visual Studio Code with Extension

Extension such as Continue is very easy to setup.

You can setup Ollama or update extension’s config.json:

"models": [
  {
    "model": "llama3:instruct",
    "title": "llama3:instruct",
    "completionOptions": {},
    "apiBase": "http://localhost:11434",
    "provider": "ollama"
  }
],

You can set model roles to your local copilot:

"modelRoles": {
  "default": "llama3:instruct",
  "summarize": "llama3:instruct"
},

Have fun!

Share