I’ve created a new module that uses llama.cpp to run LLM models locally.
Here’s an example:
The module has build options for CPU or GPU. CPU almost kind of works with the smallest models. I bought a new graphics card and this gave a huge improvement.