AI training

Posted on September 29, 2022 by Anonymous

I have a folder of over 5000 unique pepes. Is that enough to make a model?

Shopping Cart Returner Shirt $21.68

The Kind of Tired That Sleep Won’t Fix Shirt $21.68

Shopping Cart Returner Shirt $21.68

2 years ago

Reply

Anonymous

No
- 2 years ago
  
  Reply
  
  Anonymous
  
  checked
  - 2 years ago
    
    Reply
    
    Anonymous
    
    Scruffy-anon has gotten dubs like five times today.
    - 2 years ago
      
      Reply
      
      Anonymous
      
      Scruffy-anon is just lucky.
    - 2 years ago
      
      Reply
      
      Anonymous
      
      >An Anglo
      Too mentally unwell to train
- 2 years ago
  
  Reply
  
  The Mushroom
  
  scruf'd
2 years ago

Reply

Anonymous

you're off by a few order of magnitudes mate
- 2 years ago
  
  Reply
  
  Anonymous
  
  I wonder why this isn't enough.
  
  Surely, any human seeing these 5000 pepes, would be intelligent enough to learn how to make another pepe. What knowledge does this human have that the AI model doesn't?
  - 2 years ago
    
    Reply
    
    Anonymous
    
    >What knowledge does this human have that the AI model doesn't?
    they know what a pepe doesnt look like
    - 2 years ago
      
      Reply
      
      Anonymous
      
      correct answer. we learn more from making mistakes than from getting things right.
    - 2 years ago
      
      Reply
      
      Anonymous
      
      correct answer. we learn more from making mistakes than from getting things right.
      
      incorrect, you can learn what a pepe is/isn't like
      
      okay, and it would classify my picture of a green house as a pepe
      
      If the model is bad, or the data too few, then maybe, but this is true for any model
      - 2 years ago
        
        Reply
        
        Anonymous
        
        try making a spam text classifier. it will never learn the difference between spam and ham if you only give it spam
        
        2 years ago
        
        Reply
        
        Anonymous
        
        Does every generative AI also need to be a classification AI? Why can't it just mix and match spam bits to make more spam?
        
        2 years ago
        
        Anonymous
        
        because you can arrange spam in a way that is considered to be normal text. if you dont give the model any distinguishing data, then the bias it learns will be that green = pepe
        
        2 years ago
        
        Anonymous
        
        I definitely see what you're saying, but I can't help but think it ought to learn more even if it exclusively look at pepes. In addition to learning they should be green, it should also capture that they are usually round, usually have blue shirts, usually have two white bits representing eyes, usually have some thick ruddy lips, etc. And it seems like anything new it generated, if it adhered to those learnings, would be good enough for a human to see it as pepe-like (even if it was a green house with two white windows or something).
        
        I guess it's just hard to overcome that intuition.
        
        2 years ago
        
        Anonymous
        
        if you want to create a "stable diffusion for pepes" then i recommend you just finetune the existing SD model. it will give you what you want instead of trying to make a brand new model off a few thousand images. if you do that, they it will just create blobs
        
        2 years ago
        
        Anonymous
        
        do you mean textual inversion, i.e. just a few inputs?
        
        2 years ago
        
        Anonymous
        
        yes, or whatever dreambooth does
        
        2 years ago
        
        Anonymous
        
        i find it crazy that works. you could script the textual inversion for a new person or character and then feed it into prompts you know generate lewds and then retweet them on twitter automatically to get twitter(yous)
        
        2 years ago
        
        Reply
        
        Anonymous
        
        https://www.google.com/search?q=anomaly+detection+spam+filter
2 years ago

Reply

Anonymous

No.
2 years ago

Reply

Anonymous

>mixing apu and pepe in the same folder
disgusting
- 2 years ago
  
  Reply
  
  Anonymous
  
  Whats the smaller pepe called, the one thats squished. I dont like it one bit.
  - 2 years ago
    
    Reply
    
    Anonymous
    
    That's Apu
  - 2 years ago
    
    Reply
    
    Anonymous
    
    his name is apu and he likes you
  - 2 years ago
    
    Reply
    
    Anonymous
    
    peepo
    
    https://i.imgur.com/DcoAsrs.jpg
    
    That's Apu
    
    his name is apu and he likes you
    
    GTFO
    - 2 years ago
      
      Reply
      
      Anonymous
      
      >peepo
      the reddit frog
- 2 years ago
  
  Reply
  
  Anonymous
  
  should train an anglo for binary classification between apus and pepes
2 years ago

Reply

Anonymous

just crop and duplicate them all randomly
2 years ago

Reply

Anonymous

Training any decent AI happens on the scale of hundreds of thousands to millions.
2 years ago

Reply

Anonymous

yes, it's enough to make an overfit model
2 years ago

Reply

Anonymous

they have to be labeled for it to work, otherwise, you need to have a folder of "non pepes" to have unsupervised learning. anyways, i think you should look at how to finetune Stable Diffusion because you would need a lot more images to train a model from scratch
- 2 years ago
  
  Reply
  
  Anonymous
  
  I understand that I would have to do labelling. Like would I write a bunch of different characteristics about each Pepe?
  - 2 years ago
    
    Reply
    
    Anonymous
    
    think about how boorus are ran, you would basically have to do that
  - 2 years ago
    
    Reply
    
    Anonymous
    
    What exactly do you want to do?
    Just detect if an image is a pepe or not?
    
    Image in -> model -> out comes ???
- 2 years ago
  
  Reply
  
  Anonymous
  
  why would you need labels?
  you could just build a diffusion model and train it on the unlabelled pepes. after enough epochs it can generate new pepes in the realm of the dataset from random noise.
- 2 years ago
  
  Reply
  
  Anonymous
  
  There's unsupervised learning. He didn't said he want to classify them or something.
  - 2 years ago
    
    Reply
    
    Anonymous
    
    then what the frick are you going to do with the model? you can't tell it anything because it's a dumb fricking computer
    - 2 years ago
      
      Reply
      
      Anonymous
      
      Generative models. You produce new Pepes, by tweaking latent variables.
- 2 years ago
  
  Reply
  
  Anonymous
  
  >you need to have a folder of "non pepes"
  you can just fit a probability distribution to your data consider outliers as non-pepes
  - 2 years ago
    
    Reply
    
    Anonymous
    
    but you would need a dataset of "non pepes" anyways if you want to make one of those
    - 2 years ago
      
      Reply
      
      Anonymous
      
      nope, you only need to learn the Pepe probability distribution
      - 2 years ago
        
        Reply
        
        Anonymous
        
        okay, and it would classify my picture of a green house as a pepe
    - 2 years ago
      
      Reply
      
      Anonymous
      
      >every image in this training set which is not in the pepe training set is not a pepe
      problem solved?
      - 2 years ago
        
        Reply
        
        Anonymous
        
        there are many more pepes out there that are not in the training set
        
        2 years ago
        
        Reply
        
        Anonymous
        
        >make algorithm to classify if an image is a pepe or not
        >(if training set does not exist for this, use textual inversion with SD to create a training set of pepe and non-pepe images as only a few pepe images are required and SD can create anything)
        >run algorithm on LAION or similar
        >you now have everything you ever need
- 2 years ago
  
  Reply
  
  Anonymous
  
  Not for an autoencoder since it learns how to, well, encode the input distribution. And diffusion models are autoencoders for reasons I'm too smooth to understand.
2 years ago

Reply

Anonymous

I've made working text classification models with this amount.
Not sure if image classification requires more.
I guess it would since there is much more data in an image.
2 years ago

Reply

Anonymous

also pls post pepes so I can add more to my model
- 2 years ago
  
  Reply
  
  Anonymous
2 years ago

Reply

Anonymous

Yeah, these images are simple enough to be learned from a small sample.
2 years ago

Reply

Anonymous

I've been experimenting with dreambooth, and am getting a lot better at finetuning. I'll give it a go on pepe later tonight.
- 2 years ago
  
  Reply
  
  Anonymous
  
  care to explain how it works?
  - 2 years ago
    
    Reply
    
    Anonymous
    
    StableDiffusion has been trained on billions of text to image pairs. Through that, it has picked up concepts such as uikoyo-e artwork, and artwork by Hokusai.
    
    It has no clue what my dog looks like, but a technique exists that allows me to further train (or, finetune) on some text to image pairs of my own, that gets it to recognize the concept. With that, I can now apply those ukiyo-e images onto my dog (Work-in-Progress, I'm getting better lol).
    
    Really compute heavy, I'm on a 3090 and that is the bare minimum for doing this. Pretty fun. Attached is a half baked self portrait of me as a 19th century president
    
    https://github.com/gammagec/Dreambooth-SD-optimized
    - 2 years ago
      
      Reply
      
      Anonymous
      
      >pic
      How many years on HRT?
2 years ago

Reply

Anonymous

I admit that I'm curious. If we hypothetically downloaded every single last piece of futa porn on the Internet and gave that to an AI to use as its' learning database, what do people think would happen?
2 years ago

Reply

Anonymous

Can you upload them on mega? Been looking for a pepe dump.
- 2 years ago
  
  Reply
  
  Anonymous
  
  Seconding this
  Give access to the frogs please
- 2 years ago
  
  Reply
  
  Anonymous
  
  Seconding this
  Give access to the frogs please
  
  This. Or just please run a script to collect all their md5, thanks.
2 years ago

Reply

Anonymous

should I create a website that allows for people to submit their pepes to me so I could create a bigger dataset?
- 2 years ago
  
  Reply
  
  Anonymous
  
  get enough pepes to make a model, write a webcrawler and have it only scrape new frogs.
2 years ago

Reply

Anonymous

my model did pretty well with 3-4k basedjaks
though I only made a detector rather than a generator

I have been polishing the dataset to make an improved model but it has been on the backburner the past few months
maybe I should get around to it again
- 2 years ago
  
  Reply
  
  Anonymous
  
  very cool anon
- 2 years ago
  
  Reply
  
  Anonymous
  
  kek did you seriously annotate 4,000 images with bounding boxes by hand??
  - 2 years ago
    
    Reply
    
    Anonymous
    
    yeah!!!
    I used a graphics tablet and the 4-edges strategy rather than the 4-corners one
    it went pretty fast once I got in the zone
    - 2 years ago
      
      Reply
      
      Anonymous
      
      publish it on kaggle!
      - 2 years ago
        
        Reply
        
        Anonymous
        
        probably the next version when I remove ridiculous images from the training set and label the new images I have added
        
        rare basedjaks are rare for a reason (nobody really posts them)
        and they make it harder to learn what normal basedjaks look like
        so including them in the first version(s) of the model was a small mistake
        
        it has been too many months so it would take some sitting down to pick up the pieces again
        
        my strategy is to label a few of them, then just have the model infer on future pictures and then accept or deny however the model labeled it. then you can just press 1 button instead of drawing more boxes
        
        that is something I will look into before labeling my next batch
        
        2 years ago
        
        Reply
        
        Anonymous
        
        noooo include everything
        having a few weird images makes the dataset more robust
        
        2 years ago
        
        Anonymous
        
        >having a few weird images makes the dataset more robust
        depends on the goal of the model
        if it is to detect every single basedjak then sure
        but if it is to detect the ones that are spammed the most then the difficulty of learning them would likely introduce false positives and drag the accuracy of the intended detections down
        
        the model would have to work off 125x125 thumbnails, and basedjaks like these
        >
        are:
        >barely going to be percetible at all in a thumbnail
        >I am not sure I would classify some of them as basedjaks
        >too fragmented/cropped and there are already enough almost identical ones in the training set
        
        >4-edges strategy rather than the 4-corners one
        how does this work?
        
        sorry, it is 2 corners
        you can define a square by clicking the top left and bottom right corners
        OR
        you can click the left top right and bottom edges
        turns out the latter is much easier for humans to do than aligning two edges at once when clicking a corner
        
        https://i.imgur.com/E1q7YAM.png
        
        to be fair I think that one is fairly low accuracy
        and it is an early version of the model
    - 2 years ago
      
      Reply
      
      Anonymous
      
      my strategy is to label a few of them, then just have the model infer on future pictures and then accept or deny however the model labeled it. then you can just press 1 button instead of drawing more boxes
    - 2 years ago
      
      Reply
      
      Anonymous
      
      >4-edges strategy rather than the 4-corners one
      how does this work?
  - 2 years ago
    
    Reply
    
    Anonymous
    
    https://i.imgur.com/OTsuCYN.png
    
    yeah!!!
    I used a graphics tablet and the 4-edges strategy rather than the 4-corners one
    it went pretty fast once I got in the zone
    
    fricking hell
    every time the dedication of soijackers surprise me
    - 2 years ago
      
      Reply
      
      Anonymous
      
      considering it was meant to be a basedjak-filter
      I would not call myself a jakker
- 2 years ago
  
  Reply
  
  Anonymous
  
  now write a bot that automatically insults every basedjak spammer
- 2 years ago
  
  Reply
  
  Anonymous
- 2 years ago
  
  Reply
  
  Anonymous
  
  I remember your thread on /qa/ !
  - 2 years ago
    
    Reply
    
    Anonymous
    
    I have been LAZY
    though I have been saving basedjaks and basedjak-lookalikes ever since then
    - 2 years ago
      
      Reply
      
      Anonymous
      
      is the booru.onions useful to you?
      - 2 years ago
        
        Reply
        
        Anonymous
        
        no idea
        need to sort out my duplicates first since I have quite a few of those
        I was setting up a duplicate finder when I took a hiatus and never got around to it
        
        2 years ago
        
        Reply
        
        Anonymous
        
        would this help with duplicates?
        https://github.com/qarmin/czkawka
        
        2 years ago
        
        Anonymous
        
        I have one using phash already
        just have to get around to running it and making sure there are not false positives or anything
        
        2 years ago
        
        Anonymous
        
        >I have one using phash already
        >just have to get around to running it and making sure there are not false positives or anything
        i dont know how you implement it but stash has a pretty good duplication detector based on phashes which let you select exact/good/medium/low matches
2 years ago

Reply

Anonymous

a guy from

[...]

supposedly made a pepe booru
2 years ago

Reply

Anonymous

Copy and paste them until you have 5 million.
- 2 years ago
  
  Reply
  
  Anonymous
2 years ago

Reply

Anonymous

You can put 4,294,967,295 files into a single folder if drive is formatted with NTFS (would be unusual if it were not) as long as you do not exceed 256 terabytes (single file size and space) or all of disk space that was available whichever is less.
2 years ago

Reply

Anonymous

It's probably enough to get acceptable results with Stylegan2-ADA, but even with a 3090 it's going to take weeks of non-stop training. You should try it if you need to heat a room in your house.
- 2 years ago
  
  Reply
  
  Anonymous
  
  >but even with a 3090 it's going to take weeks of non-stop training. You should try it if you need to heat a room in your house.
  how much would it cost to rent a few A100s in a datacenter for it?
  - 2 years ago
    
    Reply
    
    Anonymous
    
    >how much would it cost to rent a few A100s in a datacenter for it?
    Used to be you could abuse the free version of Collab to do it, no idea what is currently. Also, Stylegan2-ASA is just a GAN, there's no tagging, so you're on your own as to what numbers to feed into the latent image space dimensions.
    I tried it with about 500 same-sized and cropped BBW nudes and was never able to get it to spit out anything truly passable.
    - 2 years ago
      
      Reply
      
      Anonymous
      
      did colab really give free A100s? i thought the free version only gave you a V100 at best
2 years ago

Reply

Anonymous

waifu diffussion only used about 50,000 images
is the issue more to do with the tagging?
- 2 years ago
  
  Reply
  
  Anonymous
  
  finetuning a model is different from training a new one from the ground up
2 years ago

Reply

Anonymous

Maybe you could use that to make a pepe and apu classifier, then search every archive and download every pepe. You might be able to filter duplicates if the archive stores the image hash.
You'd get tens of thousands of rare pepes and apus.
2 years ago

Reply

Anonymous

transfer learning with idk normal frog pics might help. I did that for a project. Had only like 120 images
2 years ago

Reply

Anonymous

>BOT doesn't know about apu and BOTbros
2 years ago

Reply

Anonymous

Download the torrent from https://bbwroller.com/

>torrent with over 100,000 frogs

Cancel reply