I’m currently shopping around for something a bit faster than ollama and because I could not get it to use a different context and output length, which seems to be a known and long ignored issue. Somehow everything I’ve tried so far did miss one or more critical features, like:

  • “Hot” model replacement, so loading and unloading models on demand
  • Function calling
  • Support of most models
  • OpenAI API compatibility (to work well with Open WebUI)

I’d be happy about any recommendations!

  • RandomlyRight@sh.itjust.worksOP
    link
    fedilink
    English
    arrow-up
    1
    ·
    5 hours ago

    I’ve read about this method in the GitHub issues, but to me it seemed impractical to have different models just to change the context size, and that was the point I started looking for alternatives

    • theunknownmuncher@lemmy.world
      link
      fedilink
      English
      arrow-up
      1
      ·
      5 hours ago

      You can overwrite the model by using the same name instead of creating one with a new name if it bothers you. Either way there is no duplication of the llm model file