https://twitter.com/justlv/status/1610343308831920128
>TorToiSe (TTS) trained on only 30s of audio of Sam Harris
>https://github.com/neonbjb/tortoise-tts
https://twitter.com/justlv/status/1610343308831920128
>TorToiSe (TTS) trained on only 30s of audio of Sam Harris
>https://github.com/neonbjb/tortoise-tts
okay
It's over, biobros.
Can I sound like a cute girl yet or not
Have 3-5 different 10 seconds clips of a cute girl in wave file, then train the system.
You can try the colab notebook.
Just upload your own samples.
https://github.com/neonbjb/tortoise-tts
How is that 15.ai guy coping?
>AI trained only 30 seconds of clip!!!!
>Used ChatGPT for an initial draft,
chatGPT is for the text input of a generic sam harris talk, you can write in whatever you want as text input. Thats the whole point of TTS. Text To Speech.
>Bro pretrained models are totally different!!!
>These models were trained on a small cluster of 8 NVIDIA RTX-3090s over the period of ~ 1 year.
>I started with the LibriTTS and HiFiTTS datasets, which combined contain ~896 hours of transcribed speech. I built an additional, “extended” dataset of 49,000 hours of speech audio from audiobooks and podcasts scraped from the internet.
Cope further retards, muh 30 seconds is a meme. How are we this far in and brainlets still don't know the difference between training and function result
Redditspacer + dumb moron lol
>Resorts to 4chantier insults when no arguments left
I accept your concession, brainlet.
>Nope, you just put in your own custom voice folder with few of your few second audio clips and you can get your result
Not "Nope", input is not the same thing as training data. OPs initial statement of "muh trained only 30 seconds!!" is inherently wrong, jesus christ why is this board so retarded. This is like me claiming some autoencoder or dall-e trained on just one single image to produce le epic AI art XYZ.
Its not wrong. The guy took 30 seconds of a random Sam Harris audio and then generated the full breathe of speech.
The pretrained model + new custom audio of your choice = your audio result.
The anon you’re responding is right, though. It was not ”trained on only 30s of audio”, it was fine tuned or one shot learned or whatever that system does with the given data. It’s cool that you can do it, but you just can’t use standard terminology in a non-standard way.
>Its not wrong. The guy took 30 seconds of a random Sam Harris audio and then generated the full breathe of speech.
This isn't "trained in only 30 seconds" though you retards. Generation != training. Is chatGPT training for 5 seconds when it gives you a response based on your input? NO
If you have chatGPT speak like Elon Musk with just few sentences of input. Then when you ask any question and all the answer is given as if Elon Musk is writing, then sure it is
>Then when you ask any question and all the answer is given as if Elon Musk is writing, then sure it is
If you're too retarded to understand the terminology then I don't have the time to explain it to you. Anyone above 80iq should understand the difference. It's like saying an Olympic swimmer only trained for 2 minutes because they performed the task in 2 minutes! wow! only 2 minutes!
No.
Its like taking a random person, then you show 30 seconds clip of judo techniques and then they become a judoka after that.
Before you show them video of judoka, they're just a random person. So 30 seconds gives them enough training to make a rando into a master
>Its like taking a random person, then you show 30 seconds clip of judo techniques and then they become a judoka after that.
No you FUCKIGN RETARD IT'S A LANGUAGE MODEL TRAINED TO SPEAK LANGUAGES
Its a generalized voice modulation. With specified custom voice module that you yourself can train on and output in the fashion of your desire.
you're right. too bad you're a moron loving rëddit gay. go back.
To add onto that, we don't even know if Sam Harris podcasts were included in the training data. Because if they were, this makes the whole ebin achievement even less impressive. This is why you don't trust random twitteratis to do actual research, they use the code bootcamp approach to advanced topics and feel smart if they write 3 pages of a "paper" with shitty graphs
You're correct, but frogposting over
>Used ChatGPT for an initial draft,
was equally as retarded as the people who responded. By the way, I'm trans
Nope, you just put in your own custom voice folder with few of your few second audio clips and you can get your result
This is not training, this is finetuning you absolute fucking retard. Your model already contains pretty much everything it needs to produce the output, the small snippet is basically used to specify what part of the possible solution space you wanna get.
Man you're a dumb moron
How can non chuds train their own AIs all the time? Chudbros are being left in the dust
The cool thing about it is you can have your own voice read books in a pretty decent way.
Why are people who are into AI unable to understand basic technology?
Do they think all technology is just magic and beyond human comprehension? is that why they worship AI?
>Do they think all technology is just magic and beyond human comprehension? is that why they worship AI?
yes. these low iq retards lost big time on the cryptocoin game, so now they're all gathering around machine learning websites hoping to find the next cash cow. they think they're so intelligent because they can copy/paste text into a window. since these people have the english skills of 4yos, chatgpt looks like a sentient being. the problem that these dangerously low iq monkeys face is that the monkeys don't own the datasets, nor do they have the computing power or bandwidth to create their own.
>understand basic technology
whatever the fuck does that mean? OP is a troglodyte but so are you.
eh the jordan peterson ai was funnier
>jordan peterson ai
Which one?
The funny one ofcourse
If you can get some clean jordan peterson audio clips, you can make your own jordan peterson audio with this TTS AI too.
still a meme that will never replace voice actors
Indie game devs should be using it for their own needs.
Okay, but can I use it to make ASMR?
Someone please help me understand this repo.
Which files are the models?
The files in the model folders are .wav so they are either generated samples or training samples or w/e. They can't be the actual model can they?
It says in the readme that you can run it locally so they should be there right ?
Or does it just do web queries?
nvm, it's the cvlp2.pth, and a few others that get downloaded during install
cant wait to retire off of my entirely AI generated podcast that normies will eat up