• Eager Eagle@lemmy.world
    link
    fedilink
    English
    arrow-up
    49
    ·
    4 天前

    I bet he just wants a card to self host models and not give companies his data, but the amount of vram is indeed ridiculous.

    • Jeena@piefed.jeena.net
      link
      fedilink
      English
      arrow-up
      25
      ·
      4 天前

      Exactly, I’m in the same situation now and the 8GB in those cheaper cards don’t even let you run a 13B model. I’m trying to research if I can run a 13B one on a 3060 with 12 GB.

        • levzzz@lemmy.world
          link
          fedilink
          English
          arrow-up
          4
          ·
          4 天前

          You need a pretty large context window to fit all the reasoning, ollama forces 2048 by default and more uses more memory

        • Viri4thus@feddit.org
          link
          fedilink
          English
          arrow-up
          2
          ·
          4 天前

          I also have a 3060, can you detail which framework (sglang, ollama, etc) you are using and how you got that speed? i’m having trouble reaching that level of performance. Thx

          • The Hobbyist@lemmy.zip
            link
            fedilink
            English
            arrow-up
            4
            ·
            edit-2
            4 天前

            Ollama, latest version. I have it setup with Open-WebUI (though that shouldn’t matter). The 14B is around 9GB, which easily fits in the 12GB.

            I’m repeating the 28 t/s from memory, but even if I’m wrong it’s easily above 20.

            Specifically, I’m running this model: https://ollama.com/library/deepseek-r1:14b-qwen-distill-q4_K_M

            Edit: I confirmed I do get 27.9 t/s, using default ollama settings.

            • Viri4thus@feddit.org
              link
              fedilink
              English
              arrow-up
              2
              ·
              4 天前

              Ty. I’ll try ollama with the Q-4-M quantization. I wouldn’t expect to see a difference between ollama and SGlang.

            • Jeena@piefed.jeena.net
              link
              fedilink
              English
              arrow-up
              2
              ·
              4 天前

              Thanks for the additional information, that helped me to decide to get the 3060 12G instead of the 4060 8G. They have almost the same price but from what I gather when it comes to my use cases the 3060 12G seems to fit better even though it is a generation older. The memory bus is wider and it has more VRAM. Both video editing and the smaller LLMs should be working well enough.

      • manicdave@feddit.uk
        link
        fedilink
        English
        arrow-up
        4
        ·
        4 天前

        I’m running deepseek-r1:14b on a 12GB rx6700. It just about fits in memory and is pretty fast.