AI training Posted on September 29, 2022 by Anonymous I have a folder of over 5000 unique pepes. Is that enough to make a model?
Scruffy-anon has gotten dubs like five times today.
Scruffy-anon is just lucky.
Too mentally unwell to train
you're off by a few order of magnitudes mate
I wonder why this isn't enough.
Surely, any human seeing these 5000 pepes, would be intelligent enough to learn how to make another pepe. What knowledge does this human have that the AI model doesn't?
>What knowledge does this human have that the AI model doesn't?
they know what a pepe doesnt look like
correct answer. we learn more from making mistakes than from getting things right.
incorrect, you can learn what a pepe is/isn't like
If the model is bad, or the data too few, then maybe, but this is true for any model
try making a spam text classifier. it will never learn the difference between spam and ham if you only give it spam
Does every generative AI also need to be a classification AI? Why can't it just mix and match spam bits to make more spam?
because you can arrange spam in a way that is considered to be normal text. if you dont give the model any distinguishing data, then the bias it learns will be that green = pepe
I definitely see what you're saying, but I can't help but think it ought to learn more even if it exclusively look at pepes. In addition to learning they should be green, it should also capture that they are usually round, usually have blue shirts, usually have two white bits representing eyes, usually have some thick ruddy lips, etc. And it seems like anything new it generated, if it adhered to those learnings, would be good enough for a human to see it as pepe-like (even if it was a green house with two white windows or something).
I guess it's just hard to overcome that intuition.
if you want to create a "stable diffusion for pepes" then i recommend you just finetune the existing SD model. it will give you what you want instead of trying to make a brand new model off a few thousand images. if you do that, they it will just create blobs
do you mean textual inversion, i.e. just a few inputs?
yes, or whatever dreambooth does
i find it crazy that works. you could script the textual inversion for a new person or character and then feed it into prompts you know generate lewds and then retweet them on twitter automatically to get twitter(yous)
>mixing apu and pepe in the same folder
Whats the smaller pepe called, the one thats squished. I dont like it one bit.
his name is apu and he likes you
the reddit frog
should train an anglo for binary classification between apus and pepes
just crop and duplicate them all randomly
Training any decent AI happens on the scale of hundreds of thousands to millions.
yes, it's enough to make an overfit model
they have to be labeled for it to work, otherwise, you need to have a folder of "non pepes" to have unsupervised learning. anyways, i think you should look at how to finetune Stable Diffusion because you would need a lot more images to train a model from scratch
I understand that I would have to do labelling. Like would I write a bunch of different characteristics about each Pepe?
think about how boorus are ran, you would basically have to do that
What exactly do you want to do?
Just detect if an image is a pepe or not?
Image in -> model -> out comes ???
why would you need labels?
you could just build a diffusion model and train it on the unlabelled pepes. after enough epochs it can generate new pepes in the realm of the dataset from random noise.
There's unsupervised learning. He didn't said he want to classify them or something.
then what the fuck are you going to do with the model? you can't tell it anything because it's a dumb fucking computer
Generative models. You produce new Pepes, by tweaking latent variables.
>you need to have a folder of "non pepes"
you can just fit a probability distribution to your data consider outliers as non-pepes
but you would need a dataset of "non pepes" anyways if you want to make one of those
nope, you only need to learn the Pepe probability distribution
okay, and it would classify my picture of a green house as a pepe
>every image in this training set which is not in the pepe training set is not a pepe
there are many more pepes out there that are not in the training set
>make algorithm to classify if an image is a pepe or not
>(if training set does not exist for this, use textual inversion with SD to create a training set of pepe and non-pepe images as only a few pepe images are required and SD can create anything)
>run algorithm on LAION or similar
>you now have everything you ever need
Not for an autoencoder since it learns how to, well, encode the input distribution. And diffusion models are autoencoders for reasons I'm too smooth to understand.
I've made working text classification models with this amount.
Not sure if image classification requires more.
I guess it would since there is much more data in an image.
also pls post pepes so I can add more to my model
Yeah, these images are simple enough to be learned from a small sample.
I've been experimenting with dreambooth, and am getting a lot better at finetuning. I'll give it a go on pepe later tonight.
care to explain how it works?
StableDiffusion has been trained on billions of text to image pairs. Through that, it has picked up concepts such as uikoyo-e artwork, and artwork by Hokusai.
It has no clue what my dog looks like, but a technique exists that allows me to further train (or, finetune) on some text to image pairs of my own, that gets it to recognize the concept. With that, I can now apply those ukiyo-e images onto my dog (Work-in-Progress, I'm getting better lol).
Really compute heavy, I'm on a 3090 and that is the bare minimum for doing this. Pretty fun. Attached is a half baked self portrait of me as a 19th century president
How many years on HRT?
I admit that I'm curious. If we hypothetically downloaded every single last piece of futa porn on the Internet and gave that to an AI to use as its' learning database, what do people think would happen?
Can you upload them on mega? Been looking for a pepe dump.
Give access to the frogs please
This. Or just please run a script to collect all their md5, thanks.
should I create a website that allows for people to submit their pepes to me so I could create a bigger dataset?
get enough pepes to make a model, write a webcrawler and have it only scrape new frogs.
my model did pretty well with 3-4k basedjaks
though I only made a detector rather than a generator
I have been polishing the dataset to make an improved model but it has been on the backburner the past few months
maybe I should get around to it again
very cool anon
kek did you seriously annotate 4,000 images with bounding boxes by hand??
I used a graphics tablet and the 4-edges strategy rather than the 4-corners one
it went pretty fast once I got in the zone
publish it on kaggle!
probably the next version when I remove ridiculous images from the training set and label the new images I have added
rare basedjaks are rare for a reason (nobody really posts them)
and they make it harder to learn what normal basedjaks look like
so including them in the first version(s) of the model was a small mistake
it has been too many months so it would take some sitting down to pick up the pieces again
that is something I will look into before labeling my next batch
noooo include everything
having a few weird images makes the dataset more robust
>having a few weird images makes the dataset more robust
depends on the goal of the model
if it is to detect every single basedjak then sure
but if it is to detect the ones that are spammed the most then the difficulty of learning them would likely introduce false positives and drag the accuracy of the intended detections down
the model would have to work off 125x125 thumbnails, and basedjaks like these
>barely going to be percetible at all in a thumbnail
>I am not sure I would classify some of them as basedjaks
>too fragmented/cropped and there are already enough almost identical ones in the training set
sorry, it is 2 corners
you can define a square by clicking the top left and bottom right corners
you can click the left top right and bottom edges
turns out the latter is much easier for humans to do than aligning two edges at once when clicking a corner
to be fair I think that one is fairly low accuracy
and it is an early version of the model
my strategy is to label a few of them, then just have the model infer on future pictures and then accept or deny however the model labeled it. then you can just press 1 button instead of drawing more boxes
>4-edges strategy rather than the 4-corners one
how does this work?
every time the dedication of soijackers surprise me
considering it was meant to be a basedjak-filter
I would not call myself a jakker
now write a bot that automatically insults every basedjak spammer
I remember your thread on /qa/ !
I have been LAZY
though I have been saving basedjaks and basedjak-lookalikes ever since then
is the booru.onions useful to you?
need to sort out my duplicates first since I have quite a few of those
I was setting up a duplicate finder when I took a hiatus and never got around to it
would this help with duplicates?
I have one using phash already
just have to get around to running it and making sure there are not false positives or anything
>I have one using phash already
>just have to get around to running it and making sure there are not false positives or anything
i dont know how you implement it but stash has a pretty good duplication detector based on phashes which let you select exact/good/medium/low matches
a guy from
supposedly made a pepe booru
Copy and paste them until you have 5 million.
You can put 4,294,967,295 files into a single folder if drive is formatted with NTFS (would be unusual if it were not) as long as you do not exceed 256 terabytes (single file size and space) or all of disk space that was available whichever is less.
It's probably enough to get acceptable results with Stylegan2-ADA, but even with a 3090 it's going to take weeks of non-stop training. You should try it if you need to heat a room in your house.
>but even with a 3090 it's going to take weeks of non-stop training. You should try it if you need to heat a room in your house.
how much would it cost to rent a few A100s in a datacenter for it?
>how much would it cost to rent a few A100s in a datacenter for it?
Used to be you could abuse the free version of Collab to do it, no idea what is currently. Also, Stylegan2-ASA is just a GAN, there's no tagging, so you're on your own as to what numbers to feed into the latent image space dimensions.
I tried it with about 500 same-sized and cropped BBW nudes and was never able to get it to spit out anything truly passable.
did colab really give free A100s? i thought the free version only gave you a V100 at best
waifu diffussion only used about 50,000 images
is the issue more to do with the tagging?
finetuning a model is different from training a new one from the ground up
Maybe you could use that to make a pepe and apu classifier, then search every archive and download every pepe. You might be able to filter duplicates if the archive stores the image hash.
You'd get tens of thousands of rare pepes and apus.
transfer learning with idk normal frog pics might help. I did that for a project. Had only like 120 images
>BOT doesn't know about apu and /int/bros
Download the torrent from https://bbwroller.com/
>torrent with over 100,000 frogs