Whatever happened with this? I remember anons going ham with it but now searching just reveals a few programs behind paywalls or registration walls with no indication of quality, performance, options, etc. For all I know it's just Microsoft Sam instead of something that isn't obviously a robot.
I'm looking for TVD like pic related.
Why would I pay an Audio company to do this when I can do it on my own ?
True, but they are still using some program. The question is which one? There isn't any forward facing stuff saying this program can do this, etc.
Elevenlabs is pretty decent. I've played with it, feeding it scripts that I write myself. The primary problem with it is that inflection isn't always fluid, or proportional. For instance, the following:
>Hey! Did you hear about the latest speech synthesis model? It's super rad!
You might get something along the lines of
>(explosive)HEY!
>(meek or tentative)Did you hear about the latest speech synthesis model?
>(boisterous)It's super rad!
Even mid sentence punctuation gives it trouble. It might not pause at a comma, or pause too long, etc.
Interesting.
This sounds based as frick. You probably couldn't do it as one continuous thing but if you prompted it by parts it might work. Like this part should be monotone, this part should sound curious, etc, show the MC's mood purely from narration tone.
Is 11Labs still the best AI TTS?
do we have any alternative yet?
11labs is still leaps beyond anything you can run locally, unfortunately
styletts2 is decent though and if you don't want to hand over the shekels and/or prompts to 11labs, it's the closest one right now
>he doesn't know
When are we getting AI that can read a book, identify their voices (including the narrator), understand the scenes, and then basically give me a full experience of multiple people, wind noises, gun fire, glass breaking, footsteps, etc., basically a image-less movie of the book? I might sound moronic but I feel like we are less than a decade ago from this, and it would legitimately be amazing.
This would actually be so awesome. I listened to the Neuromancer BBC radio play that was like that and it completely mogs the audiobook that was read by the author. the author just sounds like a stoner in the 90s, which makes it hard to listen to.
?si=xjmQNB3raDnUgMn_
The Ada Wong voice actor in the Resident Evil 4 remake got absolutely mogged by AI. It's absurd how much better the AI voice mod is.
kek, thank you for reminding me:
>reminder that she seethed so much of being criticized for this performance that she accused everyone of racism
i didn't know they made an RE4 remake
just looked up comparisons and the AI is way better
... then i saw some news about it, apparently the voice actor is taking it as an attack on her being asian
since i heard the comparison before knowing who the voice actor was, i can say i didn't even know it was an asian voice, just sounded american to me
i don't expect less nowadays though, people are so incredibly sensitive when it comes to race and sex these days
>Neuromancer
>that was read by the author
>author just sounds like a stoner
Because the author is a stoner.
One of Willian Gibson's life goal was to try out every single drug in existence. So go figure...
My guy's ready for a holodeck
Stop talking like a Black person
As technologically amazing as it could be, I kind of feel like it'd be a loss to the quality of the imagination an engaged reader may possess. I prefer reading than watching movies for a reason.
I'm not so sure, the visual aspect is still up to the listener to imagine, your mind just has a bit more to go off of.
But think of the so called npcs so to speak it would be a great benefit to them
>multiple people, wind noises, gun fire, glass breaking, footsteps, etc.
This is the killer feature. We need to stop thinking of these systems as text-to-*speech* and more as all-around text-to-*audio*. Bark had the right idea having the prompting in-band so you could specify when in the clip it actually happens (e.g. "Hello [laughs] world"), and I really thought that (not Bark itself, but in-band nonverbal prompting) would be the big /vsg/ reviving game changer we needed, but interest in it seems to have dropped off around here. AudioLDM is focused specifically on nonverbal sounds, and VoiceLDM is a neat experiment in combining it with TTS to get something like what you're describing (e.g. prompt "She is talking in a park."), but it's a little finicky, and it doesn't provide the granularity that Bark does.
You should have a nice day
someone would have to invent a new type of multimodal model
something like llava but text -> tokenize -> recognize -> split -> sound
this seems really hard to achieve, but not impossible
who cares lol? welcome to capitalism buddy
you can probably do that already somehow using technology aside from ai
Like C-3PO telling the story of their adventure so far to the Ewoks? That's gonna be a long wait...
you're describing audiobooks for children
childrens books have sound effects
you mean cool books have sound effects
You're supposed to be doing that with your brain.
You do know that you can already do that with your own imagination, right?
HARRY!
DID YOU PUT YOUR NAME IN THE GOBLET OF FIRE?!
dumbledore said calmly
I wish more people could look at AI potential like this. I think it is going to be so insanely disruptive to so many industries. Not just workers getting laid off, but customers suddenly not requiring the services of all sorts of companies. They are laying off all these voice actors and replacing them with AI. So what? I could just generate the audiobook for myself now. Don't even need the audiobook industry to exist anymore.
OMG I WON'T BE ABLE BE A WAGIE ANYMORE!!!!!!
>my friend who works for a company selling clothes just told me they’re replacing their seamstresses with sewing machines
>if we don’t regulate this now the damage will be immeasurable
Bye bye!
Another week, another industry. When the frick are politicians going to start regulating this shit, or stop pretending like they give a frick what happens to the people
Why the frick would they regulate it?
Because it is going to make most humans obsolete
reddit is down the hall and to the left
Don't care.
point to a single instance of an artist telling someone to code
>E-book readers were telling coal miners to learn to code
Voicegays are even more turbo s()y than graphical artists and coders. I can believe it.
It's been exactly one year since Elevenlabs dropped (
) and it hasn't gotten any better nor has open source fully caught up. What the frick happened? Where's the infinite exponential growth into the metaverse on mars I was promised? Seems like everything regarding AI has completely flatlined since GPT-4. I thought by now I'd have insane TTS AI but it's still the fricking same as it was when I last looked into it. What a disappointment it all has been... two more weeks sirs!
>nor has open source fully caught up.
Tardbro, StyleTTS and XTTSV2 exist.
care to share any examples? 11labs still mogs everything i've heard so far, it's not even close.
https://vocaroo.com/1lU91UW8qjC2
https://vocaroo.com/12QGwp4vk8O0
https://vocaroo.com/1dtrp3RcdsTJ
Just make it so any AI work must pay the original artist 100% as if he/she did it, unless you sign some open-source shit for your voice/likeness/artwork
This. Data shouldn't be allowed to be used to train models unless it's been specifically agreed to by the author for that purpose.
Google is training their AI using billions of Youtube videos... good luck figuring out who does generic male voice #472 with neutral American accent.
good
I think laws are pretty slow on this regard, like, yes, you don't read TOS or shit when using any service and it probably says they can do whatever the frick they want with your videos but I would love to see a new law were companies need to notify if your video(s) has been used for AI training, even if you don't get paid for it.
Consent should be a requirement for training AI models and they should make models trained on non-consensual data illegal to possess.
If the information's private, like my private porn collection, sure. If it's public view, humans can experience and subsequently learn from this information freely, why shouldn't they be able to side-load the learning to AI?
It's the use of this learning which requires regulation, again where it concerns others.
Force google to provide every single source of training data, or they are not protected in any case whatsoever legally from whatever anyone can make the Google AI do, if someone tricks googles AI to do copyright infringement, Google pays the fine or the employees who made the AI go to jail for mass distribution of copyrighted content
This is the only way to force these anti-human companies to play nice
I paid for elevenlabs for a month and did the voice cloning off someone who did podcasts.
It didn't sound anything like him. I cancelled my subscription. The end.
love this meme where artists are realizing they aren't immune to the ever increasing tide of automation and finally think about all the other fields that got 0 support and news about being automated.
>Spam memes about coding
>Get replaced by code
>TFW
Artists aren't realizing shit. The only thing they understand is that they're under threat from automation, nothing more and nothing less. They implicitly lack the self awareness or empathy needed to compare themselves to others who've faced the same challenges in the past. They see themselves as a privileged class, and plebians outside that class deserve no sympathy in their eyes, for the plight of the plebs is nothing in comparison to the struggles of an auteur.
tldr; they don't have the capacity to self reflect like you're implying they do.
I’m not gonna fedpost I’m not gonna fedpost I’m not gonna fedpost I’m not gonna fedpost I’m not gonna fedpost I’m not gonna fedpost I’m not gonna fedpost
The question is why, as a user, you wouldn't pirate all those books and pay for the service to dub every book for you? Why do users need middlemen to do everything for them?
I loathe how ridiculous this statement has become. For the price of a bigmac, fries and coke, i can cook a whole dinner.
poorgay cope, if you have to cook your own meal you're still paying with your time which is cucked
>5min to make a sandwich is too much
actual poorgay cope that is just punching down
if you don't have free time because you're slaving away, you aren't rich as you gays like to pretend
please bro you have to work 150 hours per week or else you're poor i don't sleep for days on end thinking about work you wouldn't understand i cant even cook on the weekend because i have to take my wife to the sex club to try to find someone to frick her please they wouldnt even let you into the club because youre too broke its only for real professionals please i drink every single day but its not a proble, i have a job, i have JOB, you wouldnt get it
>he's paid an hourly wage
lmfao absolute turbo mcdonalds worker
What a sad life you must live.
Given the amount of well off and nicely dressed people at any given McDonalds establishment you may be onto something.
i can spend 30min in the kitchen and make enough food to last a few days
you don't need to spend all that time making just one serving, you know
>my friend whose dad works at nintendo blah blah blah bix nood muhfuggn shhiieeett
who writes this shit
Local versions are actively used to make degenerate porn voiceovers, asmr etc.
So you don't hear about them because they're no longer a tech demo, but do actual work.
I was just going to ask about this. Images, chatbots, etc, all those have local versions even if they are smaller models and slower because most don't have industrial hardware etc. But I don't see any mention of local models, nor is there a thread like there is for aicg, lmg, stable diffusion...
Elevenlabs is ok, but while it's cheaper than voicegays by far it would still cost a few hundred for one audio book. And it doesn't seem as if there's any emphasis or tonal shift so you'll get shit like the narration being read with some interest and then a very enthusiastic speech read in this depressive monotone wtf.
>waaah big daddy guberment please regulate le evil AI
What the frick is wrong with these people
10 years of neverending mass hysteria, from one moral panic to the next, initiated via msm and then propagated on twitter and reddit
>People
That's the problem.
Liberating people from work is a good thing.
Sure, but there's 0 reason to expect that to happen. Are you 12?
What do you mean? Productivity with less work required always improves the standard of living.
No? Quality of life for the average American has nosedived since the 70s which is when the computer revolution started. Suicide and mental illness is up, real wages are down, deaths of despair skyrocketing almost as high as corporate profits. The only real increase in QoL anyone got was WFH which israelite managers are now crying about and calling evil.
Productivity has decreased in a lot of ways since then. Think back to the late 1800s. People complained then about losing their jobs to machinery then, but their standard of living improved for it.
Absolutely not. Go look up what life was like for the urban industrial poor in 1800s England.
NTA, but there actually is a pretty good chance of that happening. The truth is, the working class really doesn't have much more by way of assets to drain. Most of us don't own our own homes, don't have any equity in retirement, no pensions and no savings. The vast majority. The capital has already been drained from the middle and lower classes by 50 years of wealth extraction and inflation without any wage increases.
However, we are primarily useful to the system as consumers to buy goods and as debt holders, both of which are expansions in the market cap in places other than revenue. If people don't consume, there is no point to creating the goods/value in the first place, as its value remains only theoretical until it is sold. Likewise, additional debt can't exist within the system without additional people, so if you reduce the amount of people able to take on debts by killing off or enslaving all the poors, you're actually crunching billions of potential dollaridoos out of the economy.
There's a very good chance that we're not going to be needed to work in the future, as the cost of labor and transportation is far outweighing the capital gained from sales. However, they do still need us to consume and to facilitate debt, so hello UBI.
I hope you're correct but I specifically disagree with:
>If people don't consume, there is no point to creating the goods/value in the first place, as its value remains only theoretical until it is sold.
It isn't necessarily true because the motivation to seek infinite profit is not inherent, profit and value are proxies for POWER which is what the sociopaths we've been funneling into leadership positions for the past 100+ years want.
>increase profits and decrease expenditure
>damage
?
>your fricking job is to read books
If your voice is worse than AI perhaps you don't deserve a paycheck.
Also if AI replaces a programmer (you) that programmer deserves to be jobless
The tech isn't quite there yet, but obviously it is going to happen within the next couple of years.
The story is likely fake, but the tech will soon be mature enough to replace all voice actors.
It also doesn't face the enormous challenge of natural language generation, where current tech can't comprehend relationships over e.g. the length of a novel. Text to Voice doesn't have that Problem, tone of voice depends on the couple surrounding sentences at most.
Tone of voice still seems erratic. I've seen it be somewhat eager in narration and then become this flat monotone in what should be a deep passionate speech.
It definitely has some epic fricking wins though even if it isn't on the main project I'm using it for.
Be sure and drink something before clicking this shit:
https://litter.catbox.moe/7a0ljm.mp3
Voicegays eternally BTFO.
>G0RMY
Lel.
Yes, the tech isn't completely there yet, but it is obviously coming. Analyzing the needed tone seems a much easier problem than long distance attention. And at some point it will just be good enough to replace voice actors.
I've seen a different program/service that does let you control tone with tags. Does that work with elevenlabs or is it behind one of the paywalls? Because that's the only thing missing, shifting it so it's monotone when the narrator is bored, then curious/excited/angry/etc as appropriate.
100% coming. Also important for RPGs, so that writers can emphasize dialog.
But it's not there yet?
It seems average-hilariously good with dialogue, but holy frick it can't sing, especially not on key. Even I could sing better than it did lmfao.
Some other anon did a hilarious thing with Snake and wojaks, how the frick did he manage that?
They'll look high and they'll look low,
They'll look everywhere we go,
But when the s()ycucks find us we won't hide!
They'll come loud and they'll come fast,
But we shoot first and we can last!
Keep your waifu by your side!
I want to TTS smut and I don't care if it sounds like a specific celebrity or anything and I don't want to send my prompt to elevenlabs and I don't want the output to just be flatly articulated run-on monotone. Is there anything I can use yet?
Just do elevenlabs, to this day nobody has been banned or even warned for prompting smut
They're obviously aware it's happening and don't care
it's not so much that I don't want to get banned, it more that I don't want my prompts anywhere near a service that receives payments from any account I own. Sites get hacked all the time. Imagine if a database came out of everyone's prompts with their corresponding payment info. No thanks.
Your other option is RVC on collab but you need a source audio for it to dub over
So you could always record the audio yourself with all the exact intonation you want but you might find recording and hearing that back too embarrassing a prospect to muster.
Also I think they might have messed with the RVC collab thing and it might not work now, I'm not up to date
>get born
>get lucky and have some weird talent
>get even more lucky and meet people who hook you up with a job using that talent
>easy money glitch
>bullshit do nothing job
>AI comes out
>patches your exploit
>WAHHHHHHHHH WE NEED GOVERNEMTN TO DO SHIT IM DIFFERENT IM SPECIAL EVERYBODY NEEDS TO TURN INTO A LUDITE JUST SO I NEVER HAVE TO GET A REAL JOB
I mean who the frick actually feels sorry for these fricks? Nobody feels bad when 100,000 minimum wage factory workers lose their job. Nobody passes any fricking laws to stop it. Now a bunch of homosexuals with the incredibly niche job of reading books out loud are losing their job and all of a sudden technology is bad? Why are laws being considered to stop it? Are there even 100 of these homosexuals in the entire world? The utter disdain for working class peasants in this country is just sad. Nobody give a frick about them.
Learn to code, artgays
>a future of all engineers
horrifying
its not "damaging" an industry
its tech disrupt
Can AI voice synthesis do a roflcopter?
This ai has really drawn out the people doing this niche professions that nobody thought were making money.
they also inserted snide remarks about gamergate instead of doing their jobs properly
Frick these naggers 2bqh famalam
Gamergate is literally the reason all humans oppose them.
They could have literally just shut their prostitute mouths and we'd have assumed a half dozen morons were being morons, but that was the end of it. The investigation only came because they acted like NPC swarms when questioned.
That's also likely the reason why the NPC meme is common knowledge. Speaking of...
Anon has a fricking point. While it can potentially be interesting and is worth doing just because it makes voicegays seethe, the primary audience of audiobooks is illiterate NPCs pretending they are "readers".
I regret not downloading as much AI voice clips as I could while elevenlabs was free for like 1-2 days. There was a simple website with white background that had a tutorial for AI voice cloning using elevenlabs, and it also had many samples including memes from disgaea and mitsuru from persona 3, did anyone bookmark that page? know the name? there has to be at least an archive right? frick.
yes I made it
https://rentry.org/aivoicestuff
I was there for precucked chatgpt, the launch of stable diffusion, dalle, the novelai leak, all of it. still nothing was as fun as the elevenlabs voice threads. it was the perfect blend of anon creativity powered by ai. its a shame voice never took off really locally in the same way stablediffusion and llama did.
oh shit it's alive! thank you for your work. And yeah those ai voice threads were crazy, a lot of unfunny and "racist for luls" voices but nevertheless creative, and really good ones here and there. It was so good it had to be shut down sadly.
I just want to share that i'm grateful for the compilation. I just want a cute AI waifu to read naughty words to me or speak a sentence i prepared as if it was directed to me. I'll sign in Feb on my bday just for a month to try it out.
Until we have open source, this is going to be it. Maybe 2024 will be the year of AI waifu voice.
imagine being so irrelevant as an artist that literally an algorithm replaces you. you were never gonna make it anyway
Total human death when?
it happened?
Everspace2 devs AI voicelined a frick ton of radio chatter lines they will never go bak.
They also used their own voices so these voice guild merchants can't even cry about it.
https://vocaroo.com/19BsaDqFJApI
Xtts2 + deepspeed works fine. Pretty fast and voice cloning like this working decent.
https://github.com/daswer123/xtts-webui
Extremely fast + voice cloning + fine tuning that even a child can do
Extremely easy fine-tuning/training of the model with custom datasets. Just drop in the audio files and it generates all the text/transcription data sets using whisper and tunes the model.
KEK.
Interesting. Have some more KEKs.
https://litter.catbox.moe/u9l10g.mp3
Does anyone have a voice clip of the YWNBAW copypasta?
>https://rentry.org/aivoicestuff
>Tony Jay
>Example: https://vocaroo.com/1eFZ8cSqmxcw
There's one I found.
KEK. This is fairly good but I used my quota so I can't redo it in that funny angry voice.
I press buttons and wait out timers and several days work of food cooks while I sit comfy.
Took me 1 min to do the whole thing.
1) downloaded the copy pasta clip
2) put it under whisper for transcription
3) voice clone with 1 min of bateman voice sample
The funny voice I'm using is called Patrick under elevenlabs but it's not Bateman, it sounds more like this:
https://vocaroo.com/1a4VwWKF0XJh
Yeah, we need a new model with more expressive dataset. Then an AI that can detect/change emotions/tone of the voice. XTTS is great little loot for home usage, but its developed by a now defunct/bankrupt opensource company.
Yeah, even elevenlabs can't modulate tone. Apparently some amazon thing can, but dunno how well it works.
https://docs.aws.amazon.com/polly/latest/dg/supportedtags.html
It should be doable. We know what angry tones are like. We know what happy/sad/neutral/etc tones are like. Train a large data set with those distinctions and implement slider control. You can easily get a understanding of tones through that. It doesn't have to be single tone for single emotion either, there could be various subset of tones to choose from organized by self similar data sets. Hell, we can even detect the tone of something just through the sentence/text alone, but thats preliminary textual prompt. But on the tts inference model data side, it should really doable.
They have no leverage with corporations or the government, and if they got the "worker's revolution" they wanted their asses would be sent to a collective farm.
>pic
yeah, just like they protected all those human translators
After the way they fricked the half life 2 dub for my language, I genuinely think this is the best course of actions for some studios. The dev gets to choose to either:
>Pay for the recording space
>Buy/loan the equipment
>Pay the voice actors
>Get the license for the audio editing software to stitch it all together
>Hire someone to work on the audio files(optional)
Or:
>Pay a subscription cloud service and produce audio files according to the specs.
It will not be a wonder when devs pick the second approach
I've heard a few samples from these AI voices, and they're impressive but they miss a lot nuance that makes a good narrator and tend to just sound like a school kid reciting the book to their class. No way can they compete (yet) with regular narrators, and there's absolutely no way they'll ever compete with actors that also narrate (Steven Fry, for example).
>absolutely no way they'll ever compete with actors that also narrate
I should screencap this for the moron folder to repost in 5 years
months*
A year ago AI was Replika and censored character AI. Now you can get stuff that BTFOs all females and most males.
At this point combining two voices I already found would get me close for what I need in a major project.
Just something that sounds like that angry voice when smiting fools or laying down the law, and something deeper and calmer most other times but the same voice, not like this where you can clearly tell it's a different man.
https://vocaroo.com/14mQe4XgTJHe
I'm on audible and a bunch of readers can't read for shit. They mispronounce words all the time and can't seem to understand contexts at all. They're just human printers. And I fricking hate printers.
>ecker went AWOL with his AIVC work
sadge