AI training

I have a folder of over 5000 unique pepes. Is that enough to make a model?

ChatGPT Wizard Shirt $21.68

Beware Cat Shirt $21.68

ChatGPT Wizard Shirt $21.68

  1. 1 year ago
    Anonymous

    No

    • 1 year ago
      Anonymous

      checked

      • 1 year ago
        Anonymous

        Scruffy-anon has gotten dubs like five times today.

        • 1 year ago
          Anonymous

          Scruffy-anon is just lucky.

        • 1 year ago
          Anonymous

          >An Anglo
          Too mentally unwell to train

    • 1 year ago
      The Mushroom

      scruf'd

  2. 1 year ago
    Anonymous

    you're off by a few order of magnitudes mate

    • 1 year ago
      Anonymous

      I wonder why this isn't enough.

      Surely, any human seeing these 5000 pepes, would be intelligent enough to learn how to make another pepe. What knowledge does this human have that the AI model doesn't?

      • 1 year ago
        Anonymous

        >What knowledge does this human have that the AI model doesn't?
        they know what a pepe doesnt look like

        • 1 year ago
          Anonymous

          correct answer. we learn more from making mistakes than from getting things right.

        • 1 year ago
          Anonymous

          correct answer. we learn more from making mistakes than from getting things right.

          incorrect, you can learn what a pepe is/isn't like

          okay, and it would classify my picture of a green house as a pepe

          If the model is bad, or the data too few, then maybe, but this is true for any model

          • 1 year ago
            Anonymous

            try making a spam text classifier. it will never learn the difference between spam and ham if you only give it spam

            • 1 year ago
              Anonymous

              Does every generative AI also need to be a classification AI? Why can't it just mix and match spam bits to make more spam?

              • 1 year ago
                Anonymous

                because you can arrange spam in a way that is considered to be normal text. if you dont give the model any distinguishing data, then the bias it learns will be that green = pepe

              • 1 year ago
                Anonymous

                I definitely see what you're saying, but I can't help but think it ought to learn more even if it exclusively look at pepes. In addition to learning they should be green, it should also capture that they are usually round, usually have blue shirts, usually have two white bits representing eyes, usually have some thick ruddy lips, etc. And it seems like anything new it generated, if it adhered to those learnings, would be good enough for a human to see it as pepe-like (even if it was a green house with two white windows or something).

                I guess it's just hard to overcome that intuition.

              • 1 year ago
                Anonymous

                if you want to create a "stable diffusion for pepes" then i recommend you just finetune the existing SD model. it will give you what you want instead of trying to make a brand new model off a few thousand images. if you do that, they it will just create blobs

              • 1 year ago
                Anonymous

                do you mean textual inversion, i.e. just a few inputs?

              • 1 year ago
                Anonymous

                yes, or whatever dreambooth does

              • 1 year ago
                Anonymous

                i find it crazy that works. you could script the textual inversion for a new person or character and then feed it into prompts you know generate lewds and then retweet them on twitter automatically to get twitter(yous)

            • 1 year ago
              Anonymous

              https://www.google.com/search?q=anomaly+detection+spam+filter

  3. 1 year ago
    Anonymous

    No.

  4. 1 year ago
    Anonymous

    >mixing apu and pepe in the same folder
    disgusting

    • 1 year ago
      Anonymous

      Whats the smaller pepe called, the one thats squished. I dont like it one bit.

      • 1 year ago
        Anonymous

        That's Apu

      • 1 year ago
        Anonymous

        his name is apu and he likes you

      • 1 year ago
        Anonymous

        peepo

        https://i.imgur.com/DcoAsrs.jpg

        That's Apu

        his name is apu and he likes you

        GTFO

        • 1 year ago
          Anonymous

          >peepo
          the reddit frog

    • 1 year ago
      Anonymous

      should train an anglo for binary classification between apus and pepes

  5. 1 year ago
    Anonymous

    just crop and duplicate them all randomly

  6. 1 year ago
    Anonymous

    Training any decent AI happens on the scale of hundreds of thousands to millions.

  7. 1 year ago
    Anonymous

    yes, it's enough to make an overfit model

  8. 1 year ago
    Anonymous

    they have to be labeled for it to work, otherwise, you need to have a folder of "non pepes" to have unsupervised learning. anyways, i think you should look at how to finetune Stable Diffusion because you would need a lot more images to train a model from scratch

    • 1 year ago
      Anonymous

      I understand that I would have to do labelling. Like would I write a bunch of different characteristics about each Pepe?

      • 1 year ago
        Anonymous

        think about how boorus are ran, you would basically have to do that

      • 1 year ago
        Anonymous

        What exactly do you want to do?
        Just detect if an image is a pepe or not?

        Image in -> model -> out comes ???

    • 1 year ago
      Anonymous

      why would you need labels?
      you could just build a diffusion model and train it on the unlabelled pepes. after enough epochs it can generate new pepes in the realm of the dataset from random noise.

    • 1 year ago
      Anonymous

      There's unsupervised learning. He didn't said he want to classify them or something.

      • 1 year ago
        Anonymous

        then what the frick are you going to do with the model? you can't tell it anything because it's a dumb fricking computer

        • 1 year ago
          Anonymous

          Generative models. You produce new Pepes, by tweaking latent variables.

    • 1 year ago
      Anonymous

      >you need to have a folder of "non pepes"
      you can just fit a probability distribution to your data consider outliers as non-pepes

      • 1 year ago
        Anonymous

        but you would need a dataset of "non pepes" anyways if you want to make one of those

        • 1 year ago
          Anonymous

          nope, you only need to learn the Pepe probability distribution

          • 1 year ago
            Anonymous

            okay, and it would classify my picture of a green house as a pepe

        • 1 year ago
          Anonymous

          >every image in this training set which is not in the pepe training set is not a pepe
          problem solved?

          • 1 year ago
            Anonymous

            there are many more pepes out there that are not in the training set

            • 1 year ago
              Anonymous

              >make algorithm to classify if an image is a pepe or not
              >(if training set does not exist for this, use textual inversion with SD to create a training set of pepe and non-pepe images as only a few pepe images are required and SD can create anything)
              >run algorithm on LAION or similar
              >you now have everything you ever need

    • 1 year ago
      Anonymous

      Not for an autoencoder since it learns how to, well, encode the input distribution. And diffusion models are autoencoders for reasons I'm too smooth to understand.

  9. 1 year ago
    Anonymous

    I've made working text classification models with this amount.
    Not sure if image classification requires more.
    I guess it would since there is much more data in an image.

  10. 1 year ago
    Anonymous

    also pls post pepes so I can add more to my model

    • 1 year ago
      Anonymous
  11. 1 year ago
    Anonymous

    Yeah, these images are simple enough to be learned from a small sample.

  12. 1 year ago
    Anonymous

    I've been experimenting with dreambooth, and am getting a lot better at finetuning. I'll give it a go on pepe later tonight.

    • 1 year ago
      Anonymous

      care to explain how it works?

      • 1 year ago
        Anonymous

        StableDiffusion has been trained on billions of text to image pairs. Through that, it has picked up concepts such as uikoyo-e artwork, and artwork by Hokusai.

        It has no clue what my dog looks like, but a technique exists that allows me to further train (or, finetune) on some text to image pairs of my own, that gets it to recognize the concept. With that, I can now apply those ukiyo-e images onto my dog (Work-in-Progress, I'm getting better lol).

        Really compute heavy, I'm on a 3090 and that is the bare minimum for doing this. Pretty fun. Attached is a half baked self portrait of me as a 19th century president

        https://github.com/gammagec/Dreambooth-SD-optimized

        • 1 year ago
          Anonymous

          >pic
          How many years on HRT?

  13. 1 year ago
    Anonymous

    I admit that I'm curious. If we hypothetically downloaded every single last piece of futa porn on the Internet and gave that to an AI to use as its' learning database, what do people think would happen?

  14. 1 year ago
    Anonymous

    Can you upload them on mega? Been looking for a pepe dump.

    • 1 year ago
      Anonymous

      Seconding this
      Give access to the frogs please

    • 1 year ago
      Anonymous

      Seconding this
      Give access to the frogs please

      This. Or just please run a script to collect all their md5, thanks.

  15. 1 year ago
    Anonymous

    should I create a website that allows for people to submit their pepes to me so I could create a bigger dataset?

    • 1 year ago
      Anonymous

      get enough pepes to make a model, write a webcrawler and have it only scrape new frogs.

  16. 1 year ago
    Anonymous

    my model did pretty well with 3-4k basedjaks
    though I only made a detector rather than a generator

    I have been polishing the dataset to make an improved model but it has been on the backburner the past few months
    maybe I should get around to it again

    • 1 year ago
      Anonymous

      very cool anon

    • 1 year ago
      Anonymous

      kek did you seriously annotate 4,000 images with bounding boxes by hand??

      • 1 year ago
        Anonymous

        yeah!!!
        I used a graphics tablet and the 4-edges strategy rather than the 4-corners one
        it went pretty fast once I got in the zone

        • 1 year ago
          Anonymous

          publish it on kaggle!

          • 1 year ago
            Anonymous

            probably the next version when I remove ridiculous images from the training set and label the new images I have added

            rare basedjaks are rare for a reason (nobody really posts them)
            and they make it harder to learn what normal basedjaks look like
            so including them in the first version(s) of the model was a small mistake

            it has been too many months so it would take some sitting down to pick up the pieces again

            my strategy is to label a few of them, then just have the model infer on future pictures and then accept or deny however the model labeled it. then you can just press 1 button instead of drawing more boxes

            that is something I will look into before labeling my next batch

            • 1 year ago
              Anonymous

              noooo include everything
              having a few weird images makes the dataset more robust

              • 1 year ago
                Anonymous

                >having a few weird images makes the dataset more robust
                depends on the goal of the model
                if it is to detect every single basedjak then sure
                but if it is to detect the ones that are spammed the most then the difficulty of learning them would likely introduce false positives and drag the accuracy of the intended detections down

                the model would have to work off 125x125 thumbnails, and basedjaks like these
                >
                are:
                >barely going to be percetible at all in a thumbnail
                >I am not sure I would classify some of them as basedjaks
                >too fragmented/cropped and there are already enough almost identical ones in the training set

                >4-edges strategy rather than the 4-corners one
                how does this work?

                sorry, it is 2 corners
                you can define a square by clicking the top left and bottom right corners
                OR
                you can click the left top right and bottom edges
                turns out the latter is much easier for humans to do than aligning two edges at once when clicking a corner

                https://i.imgur.com/E1q7YAM.png

                to be fair I think that one is fairly low accuracy
                and it is an early version of the model

        • 1 year ago
          Anonymous

          my strategy is to label a few of them, then just have the model infer on future pictures and then accept or deny however the model labeled it. then you can just press 1 button instead of drawing more boxes

        • 1 year ago
          Anonymous

          >4-edges strategy rather than the 4-corners one
          how does this work?

      • 1 year ago
        Anonymous

        https://i.imgur.com/OTsuCYN.png

        yeah!!!
        I used a graphics tablet and the 4-edges strategy rather than the 4-corners one
        it went pretty fast once I got in the zone

        fricking hell
        every time the dedication of soijackers surprise me

        • 1 year ago
          Anonymous

          considering it was meant to be a basedjak-filter
          I would not call myself a jakker

    • 1 year ago
      Anonymous

      now write a bot that automatically insults every basedjak spammer

    • 1 year ago
      Anonymous
    • 1 year ago
      Anonymous

      I remember your thread on /qa/ !

      • 1 year ago
        Anonymous

        I have been LAZY
        though I have been saving basedjaks and basedjak-lookalikes ever since then

        • 1 year ago
          Anonymous

          is the booru.onions useful to you?

          • 1 year ago
            Anonymous

            no idea
            need to sort out my duplicates first since I have quite a few of those
            I was setting up a duplicate finder when I took a hiatus and never got around to it

            • 1 year ago
              Anonymous

              would this help with duplicates?
              https://github.com/qarmin/czkawka

              • 1 year ago
                Anonymous

                I have one using phash already
                just have to get around to running it and making sure there are not false positives or anything

              • 1 year ago
                Anonymous

                >I have one using phash already
                >just have to get around to running it and making sure there are not false positives or anything
                i dont know how you implement it but stash has a pretty good duplication detector based on phashes which let you select exact/good/medium/low matches

  17. 1 year ago
    Anonymous

    a guy from

    [...]

    supposedly made a pepe booru

  18. 1 year ago
    Anonymous

    Copy and paste them until you have 5 million.

    • 1 year ago
      Anonymous
  19. 1 year ago
    Anonymous

    You can put 4,294,967,295 files into a single folder if drive is formatted with NTFS (would be unusual if it were not) as long as you do not exceed 256 terabytes (single file size and space) or all of disk space that was available whichever is less.

  20. 1 year ago
    Anonymous

    It's probably enough to get acceptable results with Stylegan2-ADA, but even with a 3090 it's going to take weeks of non-stop training. You should try it if you need to heat a room in your house.

    • 1 year ago
      Anonymous

      >but even with a 3090 it's going to take weeks of non-stop training. You should try it if you need to heat a room in your house.
      how much would it cost to rent a few A100s in a datacenter for it?

      • 1 year ago
        Anonymous

        >how much would it cost to rent a few A100s in a datacenter for it?
        Used to be you could abuse the free version of Collab to do it, no idea what is currently. Also, Stylegan2-ASA is just a GAN, there's no tagging, so you're on your own as to what numbers to feed into the latent image space dimensions.
        I tried it with about 500 same-sized and cropped BBW nudes and was never able to get it to spit out anything truly passable.

        • 1 year ago
          Anonymous

          did colab really give free A100s? i thought the free version only gave you a V100 at best

  21. 1 year ago
    Anonymous

    waifu diffussion only used about 50,000 images
    is the issue more to do with the tagging?

    • 1 year ago
      Anonymous

      finetuning a model is different from training a new one from the ground up

  22. 1 year ago
    Anonymous

    Maybe you could use that to make a pepe and apu classifier, then search every archive and download every pepe. You might be able to filter duplicates if the archive stores the image hash.
    You'd get tens of thousands of rare pepes and apus.

  23. 1 year ago
    Anonymous

    transfer learning with idk normal frog pics might help. I did that for a project. Had only like 120 images

  24. 1 year ago
    Anonymous

    >BOT doesn't know about apu and BOTbros

  25. 1 year ago
    Anonymous

    Download the torrent from https://bbwroller.com/

    >torrent with over 100,000 frogs

Leave a Reply to Anonymous Cancel reply

Your email address will not be published. Required fields are marked *