AI training

I have a folder of over 5000 unique pepes. Is that enough to make a model?

  1. 2 months ago
    Anonymous

    No

    • 2 months ago
      Anonymous

      checked

      • 2 months ago
        Anonymous

        Scruffy-anon has gotten dubs like five times today.

        • 2 months ago
          Anonymous

          Scruffy-anon is just lucky.

        • 2 months ago
          Anonymous

          >An Anglo
          Too mentally unwell to train

    • 2 months ago
      The Mushroom

      scruf'd

  2. 2 months ago
    Anonymous

    you're off by a few order of magnitudes mate

    • 2 months ago
      Anonymous

      I wonder why this isn't enough.

      Surely, any human seeing these 5000 pepes, would be intelligent enough to learn how to make another pepe. What knowledge does this human have that the AI model doesn't?

      • 2 months ago
        Anonymous

        >What knowledge does this human have that the AI model doesn't?
        they know what a pepe doesnt look like

        • 2 months ago
          Anonymous

          correct answer. we learn more from making mistakes than from getting things right.

        • 2 months ago
          Anonymous

          correct answer. we learn more from making mistakes than from getting things right.

          incorrect, you can learn what a pepe is/isn't like

          okay, and it would classify my picture of a green house as a pepe

          If the model is bad, or the data too few, then maybe, but this is true for any model

          • 2 months ago
            Anonymous

            try making a spam text classifier. it will never learn the difference between spam and ham if you only give it spam

            • 2 months ago
              Anonymous

              Does every generative AI also need to be a classification AI? Why can't it just mix and match spam bits to make more spam?

              • 2 months ago
                Anonymous

                because you can arrange spam in a way that is considered to be normal text. if you dont give the model any distinguishing data, then the bias it learns will be that green = pepe

              • 2 months ago
                Anonymous

                I definitely see what you're saying, but I can't help but think it ought to learn more even if it exclusively look at pepes. In addition to learning they should be green, it should also capture that they are usually round, usually have blue shirts, usually have two white bits representing eyes, usually have some thick ruddy lips, etc. And it seems like anything new it generated, if it adhered to those learnings, would be good enough for a human to see it as pepe-like (even if it was a green house with two white windows or something).

                I guess it's just hard to overcome that intuition.

              • 2 months ago
                Anonymous

                if you want to create a "stable diffusion for pepes" then i recommend you just finetune the existing SD model. it will give you what you want instead of trying to make a brand new model off a few thousand images. if you do that, they it will just create blobs

              • 2 months ago
                Anonymous

                do you mean textual inversion, i.e. just a few inputs?

              • 2 months ago
                Anonymous

                yes, or whatever dreambooth does

              • 2 months ago
                Anonymous

                i find it crazy that works. you could script the textual inversion for a new person or character and then feed it into prompts you know generate lewds and then retweet them on twitter automatically to get twitter(yous)

            • 2 months ago
              Anonymous

              https://www.google.com/search?q=anomaly+detection+spam+filter

  3. 2 months ago
    Anonymous

    No.

  4. 2 months ago
    Anonymous

    >mixing apu and pepe in the same folder
    disgusting

    • 2 months ago
      Anonymous

      Whats the smaller pepe called, the one thats squished. I dont like it one bit.

      • 2 months ago
        Anonymous

        That's Apu

      • 2 months ago
        Anonymous

        his name is apu and he likes you

      • 2 months ago
        Anonymous

        peepo

        https://i.imgur.com/DcoAsrs.jpg

        That's Apu

        his name is apu and he likes you

        GTFO

        • 2 months ago
          Anonymous

          >peepo
          the reddit frog

    • 2 months ago
      Anonymous

      should train an anglo for binary classification between apus and pepes

  5. 2 months ago
    Anonymous

    just crop and duplicate them all randomly

  6. 2 months ago
    Anonymous

    Training any decent AI happens on the scale of hundreds of thousands to millions.

  7. 2 months ago
    Anonymous

    yes, it's enough to make an overfit model

  8. 2 months ago
    Anonymous

    they have to be labeled for it to work, otherwise, you need to have a folder of "non pepes" to have unsupervised learning. anyways, i think you should look at how to finetune Stable Diffusion because you would need a lot more images to train a model from scratch

    • 2 months ago
      Anonymous

      I understand that I would have to do labelling. Like would I write a bunch of different characteristics about each Pepe?

      • 2 months ago
        Anonymous

        think about how boorus are ran, you would basically have to do that

      • 2 months ago
        Anonymous

        What exactly do you want to do?
        Just detect if an image is a pepe or not?

        Image in -> model -> out comes ???

    • 2 months ago
      Anonymous

      why would you need labels?
      you could just build a diffusion model and train it on the unlabelled pepes. after enough epochs it can generate new pepes in the realm of the dataset from random noise.

    • 2 months ago
      Anonymous

      There's unsupervised learning. He didn't said he want to classify them or something.

      • 2 months ago
        Anonymous

        then what the fuck are you going to do with the model? you can't tell it anything because it's a dumb fucking computer

        • 2 months ago
          Anonymous

          Generative models. You produce new Pepes, by tweaking latent variables.

    • 2 months ago
      Anonymous

      >you need to have a folder of "non pepes"
      you can just fit a probability distribution to your data consider outliers as non-pepes

      • 2 months ago
        Anonymous

        but you would need a dataset of "non pepes" anyways if you want to make one of those

        • 2 months ago
          Anonymous

          nope, you only need to learn the Pepe probability distribution

          • 2 months ago
            Anonymous

            okay, and it would classify my picture of a green house as a pepe

        • 2 months ago
          Anonymous

          >every image in this training set which is not in the pepe training set is not a pepe
          problem solved?

          • 2 months ago
            Anonymous

            there are many more pepes out there that are not in the training set

            • 2 months ago
              Anonymous

              >make algorithm to classify if an image is a pepe or not
              >(if training set does not exist for this, use textual inversion with SD to create a training set of pepe and non-pepe images as only a few pepe images are required and SD can create anything)
              >run algorithm on LAION or similar
              >you now have everything you ever need

    • 2 months ago
      Anonymous

      Not for an autoencoder since it learns how to, well, encode the input distribution. And diffusion models are autoencoders for reasons I'm too smooth to understand.

  9. 2 months ago
    Anonymous

    I've made working text classification models with this amount.
    Not sure if image classification requires more.
    I guess it would since there is much more data in an image.

  10. 2 months ago
    Anonymous

    also pls post pepes so I can add more to my model

    • 2 months ago
      Anonymous
  11. 2 months ago
    Anonymous

    Yeah, these images are simple enough to be learned from a small sample.

  12. 2 months ago
    Anonymous

    I've been experimenting with dreambooth, and am getting a lot better at finetuning. I'll give it a go on pepe later tonight.

    • 2 months ago
      Anonymous

      care to explain how it works?

      • 2 months ago
        Anonymous

        StableDiffusion has been trained on billions of text to image pairs. Through that, it has picked up concepts such as uikoyo-e artwork, and artwork by Hokusai.

        It has no clue what my dog looks like, but a technique exists that allows me to further train (or, finetune) on some text to image pairs of my own, that gets it to recognize the concept. With that, I can now apply those ukiyo-e images onto my dog (Work-in-Progress, I'm getting better lol).

        Really compute heavy, I'm on a 3090 and that is the bare minimum for doing this. Pretty fun. Attached is a half baked self portrait of me as a 19th century president

        https://github.com/gammagec/Dreambooth-SD-optimized

        • 2 months ago
          Anonymous

          >pic
          How many years on HRT?

  13. 2 months ago
    Anonymous

    I admit that I'm curious. If we hypothetically downloaded every single last piece of futa porn on the Internet and gave that to an AI to use as its' learning database, what do people think would happen?

  14. 2 months ago
    Anonymous

    Can you upload them on mega? Been looking for a pepe dump.

    • 2 months ago
      Anonymous

      Seconding this
      Give access to the frogs please

    • 2 months ago
      Anonymous

      Seconding this
      Give access to the frogs please

      This. Or just please run a script to collect all their md5, thanks.

  15. 2 months ago
    Anonymous

    should I create a website that allows for people to submit their pepes to me so I could create a bigger dataset?

    • 2 months ago
      Anonymous

      get enough pepes to make a model, write a webcrawler and have it only scrape new frogs.

  16. 2 months ago
    Anonymous

    my model did pretty well with 3-4k basedjaks
    though I only made a detector rather than a generator

    I have been polishing the dataset to make an improved model but it has been on the backburner the past few months
    maybe I should get around to it again

    • 2 months ago
      Anonymous

      very cool anon

    • 2 months ago
      Anonymous

      kek did you seriously annotate 4,000 images with bounding boxes by hand??

      • 2 months ago
        Anonymous

        yeah!!!
        I used a graphics tablet and the 4-edges strategy rather than the 4-corners one
        it went pretty fast once I got in the zone

        • 2 months ago
          Anonymous

          publish it on kaggle!

          • 2 months ago
            Anonymous

            probably the next version when I remove ridiculous images from the training set and label the new images I have added

            rare basedjaks are rare for a reason (nobody really posts them)
            and they make it harder to learn what normal basedjaks look like
            so including them in the first version(s) of the model was a small mistake

            it has been too many months so it would take some sitting down to pick up the pieces again

            my strategy is to label a few of them, then just have the model infer on future pictures and then accept or deny however the model labeled it. then you can just press 1 button instead of drawing more boxes

            that is something I will look into before labeling my next batch

            • 2 months ago
              Anonymous

              noooo include everything
              having a few weird images makes the dataset more robust

              • 2 months ago
                Anonymous

                >having a few weird images makes the dataset more robust
                depends on the goal of the model
                if it is to detect every single basedjak then sure
                but if it is to detect the ones that are spammed the most then the difficulty of learning them would likely introduce false positives and drag the accuracy of the intended detections down

                the model would have to work off 125x125 thumbnails, and basedjaks like these
                >
                are:
                >barely going to be percetible at all in a thumbnail
                >I am not sure I would classify some of them as basedjaks
                >too fragmented/cropped and there are already enough almost identical ones in the training set

                >4-edges strategy rather than the 4-corners one
                how does this work?

                sorry, it is 2 corners
                you can define a square by clicking the top left and bottom right corners
                OR
                you can click the left top right and bottom edges
                turns out the latter is much easier for humans to do than aligning two edges at once when clicking a corner

                https://i.imgur.com/E1q7YAM.png

                to be fair I think that one is fairly low accuracy
                and it is an early version of the model

        • 2 months ago
          Anonymous

          my strategy is to label a few of them, then just have the model infer on future pictures and then accept or deny however the model labeled it. then you can just press 1 button instead of drawing more boxes

        • 2 months ago
          Anonymous

          >4-edges strategy rather than the 4-corners one
          how does this work?

      • 2 months ago
        Anonymous

        https://i.imgur.com/OTsuCYN.png

        yeah!!!
        I used a graphics tablet and the 4-edges strategy rather than the 4-corners one
        it went pretty fast once I got in the zone

        fucking hell
        every time the dedication of soijackers surprise me

        • 2 months ago
          Anonymous

          considering it was meant to be a basedjak-filter
          I would not call myself a jakker

    • 2 months ago
      Anonymous

      now write a bot that automatically insults every basedjak spammer

    • 2 months ago
      Anonymous
    • 2 months ago
      Anonymous

      I remember your thread on /qa/ !

      • 2 months ago
        Anonymous

        I have been LAZY
        though I have been saving basedjaks and basedjak-lookalikes ever since then

        • 2 months ago
          Anonymous

          is the booru.onions useful to you?

          • 2 months ago
            Anonymous

            no idea
            need to sort out my duplicates first since I have quite a few of those
            I was setting up a duplicate finder when I took a hiatus and never got around to it

            • 2 months ago
              Anonymous

              would this help with duplicates?
              https://github.com/qarmin/czkawka

              • 2 months ago
                Anonymous

                I have one using phash already
                just have to get around to running it and making sure there are not false positives or anything

              • 2 months ago
                Anonymous

                >I have one using phash already
                >just have to get around to running it and making sure there are not false positives or anything
                i dont know how you implement it but stash has a pretty good duplication detector based on phashes which let you select exact/good/medium/low matches

  17. 2 months ago
    Anonymous

    a guy from

    [...]

    supposedly made a pepe booru

  18. 2 months ago
    Anonymous

    Copy and paste them until you have 5 million.

    • 2 months ago
      Anonymous
  19. 2 months ago
    Anonymous

    You can put 4,294,967,295 files into a single folder if drive is formatted with NTFS (would be unusual if it were not) as long as you do not exceed 256 terabytes (single file size and space) or all of disk space that was available whichever is less.

  20. 2 months ago
    Anonymous

    It's probably enough to get acceptable results with Stylegan2-ADA, but even with a 3090 it's going to take weeks of non-stop training. You should try it if you need to heat a room in your house.

    • 2 months ago
      Anonymous

      >but even with a 3090 it's going to take weeks of non-stop training. You should try it if you need to heat a room in your house.
      how much would it cost to rent a few A100s in a datacenter for it?

      • 2 months ago
        Anonymous

        >how much would it cost to rent a few A100s in a datacenter for it?
        Used to be you could abuse the free version of Collab to do it, no idea what is currently. Also, Stylegan2-ASA is just a GAN, there's no tagging, so you're on your own as to what numbers to feed into the latent image space dimensions.
        I tried it with about 500 same-sized and cropped BBW nudes and was never able to get it to spit out anything truly passable.

        • 2 months ago
          Anonymous

          did colab really give free A100s? i thought the free version only gave you a V100 at best

  21. 2 months ago
    Anonymous

    waifu diffussion only used about 50,000 images
    is the issue more to do with the tagging?

    • 2 months ago
      Anonymous

      finetuning a model is different from training a new one from the ground up

  22. 2 months ago
    Anonymous

    Maybe you could use that to make a pepe and apu classifier, then search every archive and download every pepe. You might be able to filter duplicates if the archive stores the image hash.
    You'd get tens of thousands of rare pepes and apus.

  23. 2 months ago
    Anonymous

    transfer learning with idk normal frog pics might help. I did that for a project. Had only like 120 images

  24. 2 months ago
    Anonymous

    >BOT doesn't know about apu and /int/bros

  25. 2 months ago
    Anonymous

    Download the torrent from https://bbwroller.com/

    >torrent with over 100,000 frogs

Your email address will not be published.