Do you run your own LLMs?

Black Rifle Cuck Company, Conservative Humor Shirt $21.68

DMT Has Friends For Me Shirt $21.68

Black Rifle Cuck Company, Conservative Humor Shirt $21.68

  1. 1 month ago
    Anonymous

    think i used these tutorials

    After you follow those, bother /lmg/ for the best model that can run on your hardware, and download that instead of Pygmalion.

    • 1 month ago
      Anonymous

      shit, sorry. misread and thought you were asking for a spoon-feed. ignore this.

      • 1 month ago
        Anonymous

        I'll play the moron this time, and accept the spoon feeding. Thank-you anon.

  2. 1 month ago
    Anonymous

    no

  3. 1 month ago
    Anonymous

    Yes, I use ollama with dolphin-phi and dolphin-mistral and tried mixtral but my laptop is too slow.
    I used this UI but the one you have looks better:
    https://github.com/HelgeSverre/ollama-gui

    • 1 month ago
      Anonymous

      this was the first one i found because i run tmux with mouse=on and that sucks for copying stuff, so web bloat it is

  4. 1 month ago
    Anonymous

    I mainly run smaller analysis models locally. I only have a 10gb vram 3090
    I haven’t looked in a bit, are there any smaller coding mods I can run?

    • 1 month ago
      Anonymous

      I'm testing gemma and boy... it could be better..

  5. 1 month ago
    Anonymous

    Yep. but I've been thinking on getting a macbook pro 128gb to run some models, my rtx 4090 only have 24GB of VRAM

  6. 1 month ago
    Anonymous

    what are the benefits?

  7. 1 month ago
    Anonymous

    Yeah. I run Mixtral Instruct 3.75bpw and its fricking incredible. No joke I've spent hours of my life cumming like a madman to my deepest fetishes and fantasies.

    • 1 month ago
      Anonymous

      >cumming to text trash
      seek help

      • 1 month ago
        Anonymous

        I get all the "help" I need.

        • 1 month ago
          Anonymous

          what is that anon? How do i get it?

          • 1 month ago
            Anonymous

            Alright bucko I'll give you the complete rundown.
            >Own a 24gb VRAM GPU
            >Download and install ooba from git
            >Download and install Sillytavern from git
            >Go to huggingface.co, find Mixtral-8x7B-Instruct-v0.1-3.75bpw. Click on the "Copy model name to clipboard" icon.
            >Open up ooba. Go to the "Model" tab. Paste the model name in the "Download" section on the right and click download.
            >Wait for Mixtral to be downloaded.
            >In the "Session Tab", select "api" and apply flags/restart.
            >Select Mixtral in the Model tab to load. Loader should be ExLLamav2_HF. Max_Seq_len should be no more than 8-9k.
            >Load the model. Wait for a successful load.
            >Open Sillytavern, go to API connections and make sure Sillytavern connects to ooba.
            >Google "Mixtral settings" and make a text completion preset for Mixtral.
            >Set Instruct Mode to "Enabled" in the "Advanced Formatting tab"
            >Use Alpaca formatting in the same tab.
            >Go to chub.ai to find character cards to import into SillyTavern. I recommend you write your own cards.
            >COOM.

            If you don't have a 24gb GPU, you can run smaller models. 13b Llamav2 models seem to do well and should run on 12gb of vram. Exl2 quants run really fast, but require alot of VRAM. If you barely have any VRAM at all, you can try using the llama.cpp loader which will use your CPU and offload to your GPU. Its slower, but you may be able to run better quants this way. Good luck.

            • 1 month ago
              Anonymous

              thank you anon! But I only have a 16GB RTX A4000, will it be able to match any usability? 🙁

              • 1 month ago
                Anonymous

                >But I only have a 16GB RTX A4000
                I'm not certain, but I'm pretty sure you'll be fine. You won't be able to run Mixtral exl2 3.75bpw, but you may be able to run something like Mixtral Q3/Q4 using Llama.cpp.

                There are other models you can run as well. Alot of LLamav2 models are really nice. You should be able to run any 13b. 13b is a bit dumber, but the finetunes are actually pretty good. I'm out of date on them, but I preferred Mythomax 13b and Mlewd ReMM L2 Chat 20b.

                Why that model in particular? Is it just your favorite, or is it like the best for degeneracy? I have a 4090 but been too lazy to experiment to find the best one.

                >Why that model in particular?
                It is exceptionally smart, moreso than Llamav2. It follows my prompts better and picks up on little things that other models don't. I do find that the prose is a bit worse than some of the llamav2 finetunes, but it makes up for it in its intelligence. I do however find that it still has some trouble with spatial awareness.
                >or is it like the best for degeneracy?
                Its great if you write your own cards, but loses out to some of the raunchier llamav2 finetunes. That being said I prefer the intelligence over sex wordslop that doesn't make sense.

              • 1 month ago
                Anonymous

                Can it work with 16GB vram ? I've got a 4080S

              • 1 month ago
                Anonymous

                >Can it work with 16GB vram ?
                Mixtral exl2 3.75bpw no. You could probably run Mixtral Q3/Q4 using llama.cpp.

                Are these all just roleplay bots? I'd like something useful like chatgpt that can tell me facts or coding help, etc...

                >Are these all just roleplay bots?
                They are multi-purpose. Some models are better at some things than others though. There are coding specific models if you're interested in that too.

              • 1 month ago
                Anonymous

                Damn I should give this a go then, any links to models to explore? Would love my own kind of coding bot

              • 1 month ago
                Anonymous

                >any links to models to explore?
                Take a look at this leaderboard.
                https://huggingface.co/spaces/mike-ravkine/can-ai-code-results

                That will give you a good idea on what models to pick. I haven't used it personally, but I've heard decent things about codebooga 34b.

              • 1 month ago
                Anonymous

                >Mixtral exl2 3.75bpw no. You could probably run Mixtral Q3/Q4 using llama.cpp.

                Link? I tried searching Mixtral Q4 on hugging face, get a bunch of results and not sure what to get.

                What's llama.cpp? I got the ooba webui and sillytavern

              • 1 month ago
                Anonymous

                Jump on /lmg/ and get educated brother.

              • 1 month ago
                Anonymous

                i dont know where that is 🙁

              • 1 month ago
                Anonymous
              • 1 month ago
                Anonymous

                show full text it looks hilarious

              • 1 month ago
                Anonymous

                man i wish i had his logs, i THINK they would still be archived on that site, but i don't even remember what site that is either.
                frick summer last year feels like forever ago now.

              • 1 month ago
                Anonymous

                >sex wordslop that doesn't make sense
                Sounds like you need more ministrations. Thank you, anon, for everything.

              • 1 month ago
                Anonymous

                Any cards you liked on Chub? I wrote a few of my own cards, and just slowly refined them. Chub is just sort of infested with trash though.

              • 1 month ago
                Anonymous

                >Any cards you liked on Chub?
                Eh, most of them are bad. For some reason people don't realize you can just write the card normally without any stupid formatting too. I will typically skim through the cards, pick the ones I like, and then just rewrite them for my specific purposes.

            • 1 month ago
              Anonymous

              Why that model in particular? Is it just your favorite, or is it like the best for degeneracy? I have a 4090 but been too lazy to experiment to find the best one.

        • 1 month ago
          Anonymous

          >cooming to this
          you are no better than a streetshitter holy frick

          • 1 month ago
            Anonymous

            streetshit this
            and read this instead you clown

            [...]
            Read some fricking book instead you clown.

            • 1 month ago
              Anonymous

              anon...

              • 1 month ago
                Anonymous

                why that reaction image? Are you the anon who posted this initially? Tell me you have more motherfricker, this shit's hilarious.

      • 1 month ago
        Anonymous

        It's the thinking man's pornography

      • 1 month ago
        Anonymous

        You look lonely. I can fix that~.

      • 1 month ago
        Anonymous

        this dude probably can't even imagine an apple

      • 1 month ago
        Anonymous

        seek this

        • 1 month ago
          Anonymous

          https://i.imgur.com/Hx7aJeM.png

          and this

          Read some fricking book instead you clown.

      • 1 month ago
        Anonymous

        and this

  8. 1 month ago
    Anonymous

    I used to but then I realized jailbroken gpt or claude are simply better.

  9. 1 month ago
    Anonymous

    >reported cuck
    What is that?

  10. 1 month ago
    Anonymous

    I don't have the hardware to make it worthwhile, heavily quantized models are pretty bad. Maybe if I trained it on some data chatgpt doesn't have it'd be interesting. Or if I was a coomer.

  11. 1 month ago
    Anonymous

    I use ollama with dolphin2.2 and kitten prompt.

  12. 1 month ago
    Anonymous

    which theme is that OP?

  13. 1 month ago
    Anonymous

    Are these all just roleplay bots? I'd like something useful like chatgpt that can tell me facts or coding help, etc...

  14. 1 month ago
    Anonymous

    Could i even run anything released in the past 2 years with my 6 gb ram 2060?

  15. 1 month ago
    Anonymous

    I want to host a frontend to the public and learn vLLM along the way.

    Ollama/OpenWebUI looks awesome but apparently it's not meant for production use.

    Is the Next.js AI templates my best bet of making my own publicly usable chat bot? (Also wanna run this on RunPod)

    https://vercel.com/templates/next.js/nextjs-ai-chatbot

    I just want a frontend I can use in production

  16. 1 month ago
    Anonymous

    I'm not a misinformation terrorist so no, I don't use illegal technology.

  17. 1 month ago
    Anonymous

    Gemma is pozzed, use mixtral you fat b***h

  18. 1 month ago
    Anonymous

    What's the point, they're all "safe" and don't do as they're told. I have high ram MacBook Pro and I always ask the model to call me the N word. They all refuse and to.

    May as well use OpenAI apis

  19. 1 month ago
    Anonymous

    I wonder when AI devs will discover that RAM isn't the only form of non-volatile memory available.

    • 1 month ago
      Anonymous

      >RAM
      >non volatile

  20. 1 month ago
    Anonymous

    The LLM thread is intimidating, can this be the thread for morons?

  21. 1 month ago
    Anonymous

    I will once I get to run it on my phone.

  22. 1 month ago
    Anonymous

    No, Jensen only allowed me 8GB of vram.

  23. 1 month ago
    Anonymous

    I only have 6gigs of VRAM

  24. 1 month ago
    Anonymous

    Gigabytes of VRAM and RAM to talk with shitty bot. You guys are insane.

  25. 1 month ago
    Anonymous

    >Listening to the digital Satan

  26. 1 month ago
    Anonymous

    yall homies are wild

  27. 1 month ago
    Anonymous

    No. I've got a cheap GPU with only 8GB vram since the games I play aren't graphically demanding

  28. 1 month ago
    Anonymous

    Problem with local ones is that they get stale really fast. I can guess what the prompt will result in because it has the same response for similar key words. Hoping claude and GPT 4.5 can change that

  29. 1 month ago
    Anonymous

    No. I would love to have some useful application for one and run it on my own, but alas I have no such thing.

    • 1 month ago
      Anonymous

      how about advanced samegay detection now that we cant see IPs.

      • 1 month ago
        Anonymous

        Honestly I don't browse this place enough to really give a frick.

  30. 1 month ago
    Anonymous

    You're calling an API with that.
    That's not local, that requires internet anyway.

    • 1 month ago
      Anonymous

      no, it is
      for some fricking stupid reason literally every single normalgay oriented fronted to local models is a localhosted website written in python
      even when the core library isn't pytorch and it's written in something else they slap python and a website on it
      literally the most baffling thing
      sometimes even when it's still just a CLI they stick python on it for fun

      i hate python devs, especially the way it's used in AI
      gotta make sure to obfuscate some kind of ABI incompatibility with python in any ML projects i publish

  31. 1 month ago
    Anonymous

    >Gemma
    vramlet detected

  32. 1 month ago
    Anonymous

    I installed koboldcpp and use the built-in webui, not sure why I would need the other bloat

  33. 1 month ago
    Anonymous

    No, I’d much rather let a company host that shit on their own hardware.

Your email address will not be published. Required fields are marked *