AI voice.

Posted on January 29, 2024 by Anonymous

Whatever happened with this? I remember anons going ham with it but now searching just reveals a few programs behind paywalls or registration walls with no indication of quality, performance, options, etc. For all I know it's just Microsoft Sam instead of something that isn't obviously a robot.
I'm looking for TVD like pic related.

Homeless People Are Sexy Shirt $21.68

The Kind of Tired That Sleep Won’t Fix Shirt $21.68

Homeless People Are Sexy Shirt $21.68

3 months ago

Reply

Anonymous

Why would I pay an Audio company to do this when I can do it on my own ?
- 3 months ago
  
  Reply
  
  Anonymous
  
  True, but they are still using some program. The question is which one? There isn't any forward facing stuff saying this program can do this, etc.
  - 3 months ago
    
    Reply
    
    Anonymous
    
    Elevenlabs is pretty decent. I've played with it, feeding it scripts that I write myself. The primary problem with it is that inflection isn't always fluid, or proportional. For instance, the following:
    >Hey! Did you hear about the latest speech synthesis model? It's super rad!
    You might get something along the lines of
    >(explosive)HEY!
    >(meek or tentative)Did you hear about the latest speech synthesis model?
    >(boisterous)It's super rad!
    Even mid sentence punctuation gives it trouble. It might not pause at a comma, or pause too long, etc.
    - 3 months ago
      
      Reply
      
      Anonymous
      
      Interesting.
      
      https://i.imgur.com/B2DUjb3.jpg
      
      When are we getting AI that can read a book, identify their voices (including the narrator), understand the scenes, and then basically give me a full experience of multiple people, wind noises, gun fire, glass breaking, footsteps, etc., basically a image-less movie of the book? I might sound moronic but I feel like we are less than a decade ago from this, and it would legitimately be amazing.
      
      This sounds based as frick. You probably couldn't do it as one continuous thing but if you prompted it by parts it might work. Like this part should be monotone, this part should sound curious, etc, show the MC's mood purely from narration tone.
    - 3 months ago
      
      Reply
      
      Anonymous
      
      Is 11Labs still the best AI TTS?
      do we have any alternative yet?
      - 3 months ago
        
        Reply
        
        Anonymous
        
        11labs is still leaps beyond anything you can run locally, unfortunately
        styletts2 is decent though and if you don't want to hand over the shekels and/or prompts to 11labs, it's the closest one right now
    - 3 months ago
      
      Reply
      
      Anonymous
      
      >he doesn't know
3 months ago

Reply

Anonymous

When are we getting AI that can read a book, identify their voices (including the narrator), understand the scenes, and then basically give me a full experience of multiple people, wind noises, gun fire, glass breaking, footsteps, etc., basically a image-less movie of the book? I might sound moronic but I feel like we are less than a decade ago from this, and it would legitimately be amazing.
- 3 months ago
  
  Reply
  
  Anonymous
  
  This would actually be so awesome. I listened to the Neuromancer BBC radio play that was like that and it completely mogs the audiobook that was read by the author. the author just sounds like a stoner in the 90s, which makes it hard to listen to.
  
  ?si=xjmQNB3raDnUgMn_
  - 3 months ago
    
    Reply
    
    Anonymous
    
    The Ada Wong voice actor in the Resident Evil 4 remake got absolutely mogged by AI. It's absurd how much better the AI voice mod is.
    - 3 months ago
      
      Reply
      
      Anonymous
      
      kek, thank you for reminding me:
      
      >reminder that she seethed so much of being criticized for this performance that she accused everyone of racism
      - 3 months ago
        
        Reply
        
        Anonymous
        
        The Ada Wong voice actor in the Resident Evil 4 remake got absolutely mogged by AI. It's absurd how much better the AI voice mod is.
        
        i didn't know they made an RE4 remake
        just looked up comparisons and the AI is way better
        ... then i saw some news about it, apparently the voice actor is taking it as an attack on her being asian
        since i heard the comparison before knowing who the voice actor was, i can say i didn't even know it was an asian voice, just sounded american to me
        i don't expect less nowadays though, people are so incredibly sensitive when it comes to race and sex these days
  - 3 months ago
    
    Reply
    
    Anonymous
    
    >Neuromancer
    >that was read by the author
    >author just sounds like a stoner
    Because the author is a stoner.
    One of Willian Gibson's life goal was to try out every single drug in existence. So go figure...
- 3 months ago
  
  Reply
  
  Anonymous
  
  My guy's ready for a holodeck
  - 3 months ago
    
    Reply
    
    Anonymous
    
    Stop talking like a Black person
- 3 months ago
  
  Reply
  
  Anonymous
  
  As technologically amazing as it could be, I kind of feel like it'd be a loss to the quality of the imagination an engaged reader may possess. I prefer reading than watching movies for a reason.
  - 3 months ago
    
    Reply
    
    Anonymous
    
    I'm not so sure, the visual aspect is still up to the listener to imagine, your mind just has a bit more to go off of.
  - 3 months ago
    
    Reply
    
    Anonymous
    
    But think of the so called npcs so to speak it would be a great benefit to them
- 3 months ago
  
  Reply
  
  Anonymous
  
  >multiple people, wind noises, gun fire, glass breaking, footsteps, etc.
  This is the killer feature. We need to stop thinking of these systems as text-to-*speech* and more as all-around text-to-*audio*. Bark had the right idea having the prompting in-band so you could specify when in the clip it actually happens (e.g. "Hello [laughs] world"), and I really thought that (not Bark itself, but in-band nonverbal prompting) would be the big /vsg/ reviving game changer we needed, but interest in it seems to have dropped off around here. AudioLDM is focused specifically on nonverbal sounds, and VoiceLDM is a neat experiment in combining it with TTS to get something like what you're describing (e.g. prompt "She is talking in a park."), but it's a little finicky, and it doesn't provide the granularity that Bark does.
  - 3 months ago
    
    Reply
    
    Anonymous
    
    You should have a nice day
- 3 months ago
  
  Reply
  
  Anonymous
  
  someone would have to invent a new type of multimodal model
  something like llava but text -> tokenize -> recognize -> split -> sound
  this seems really hard to achieve, but not impossible
- 3 months ago
  
  Reply
  
  Anonymous
  
  who cares lol? welcome to capitalism buddy
  
  you can probably do that already somehow using technology aside from ai
- 3 months ago
  
  Reply
  
  Anonymous
  
  Like C-3PO telling the story of their adventure so far to the Ewoks? That's gonna be a long wait...
- 3 months ago
  
  Reply
  
  Anonymous
  
  >multiple people, wind noises, gun fire, glass breaking, footsteps, etc.
  This is the killer feature. We need to stop thinking of these systems as text-to-*speech* and more as all-around text-to-*audio*. Bark had the right idea having the prompting in-band so you could specify when in the clip it actually happens (e.g. "Hello [laughs] world"), and I really thought that (not Bark itself, but in-band nonverbal prompting) would be the big /vsg/ reviving game changer we needed, but interest in it seems to have dropped off around here. AudioLDM is focused specifically on nonverbal sounds, and VoiceLDM is a neat experiment in combining it with TTS to get something like what you're describing (e.g. prompt "She is talking in a park."), but it's a little finicky, and it doesn't provide the granularity that Bark does.
  
  you're describing audiobooks for children
  childrens books have sound effects
  - 3 months ago
    
    Reply
    
    Anonymous
    
    you mean cool books have sound effects
- 3 months ago
  
  Reply
  
  Anonymous
  
  You're supposed to be doing that with your brain.
- 3 months ago
  
  Reply
  
  Anonymous
  
  You do know that you can already do that with your own imagination, right?
- 3 months ago
  
  Reply
  
  Anonymous
  
  HARRY!
  DID YOU PUT YOUR NAME IN THE GOBLET OF FIRE?!
  dumbledore said calmly
- 3 months ago
  
  Reply
  
  Anonymous
  
  I wish more people could look at AI potential like this. I think it is going to be so insanely disruptive to so many industries. Not just workers getting laid off, but customers suddenly not requiring the services of all sorts of companies. They are laying off all these voice actors and replacing them with AI. So what? I could just generate the audiobook for myself now. Don't even need the audiobook industry to exist anymore.
3 months ago

Reply

Anonymous

OMG I WON'T BE ABLE BE A WAGIE ANYMORE!!!!!!
3 months ago

Reply

Anonymous

>my friend who works for a company selling clothes just told me they’re replacing their seamstresses with sewing machines
>if we don’t regulate this now the damage will be immeasurable
Bye bye!
3 months ago

Reply

Anonymous

Another week, another industry. When the frick are politicians going to start regulating this shit, or stop pretending like they give a frick what happens to the people
- 3 months ago
  
  Reply
  
  Anonymous
  
  Why the frick would they regulate it?
  - 3 months ago
    
    Reply
    
    Anonymous
    
    Because it is going to make most humans obsolete
- 3 months ago
  
  Reply
  
  Anonymous
  
  reddit is down the hall and to the left
3 months ago

Reply

Anonymous

Don't care.
- 3 months ago
  
  Reply
  
  Anonymous
  
  point to a single instance of an artist telling someone to code
- 3 months ago
  
  Reply
  
  Anonymous
  
  >E-book readers were telling coal miners to learn to code
  - 3 months ago
    
    Reply
    
    Anonymous
    
    Voicegays are even more turbo s()y than graphical artists and coders. I can believe it.
3 months ago

Reply

Anonymous

It's been exactly one year since Elevenlabs dropped (

[...]

) and it hasn't gotten any better nor has open source fully caught up. What the frick happened? Where's the infinite exponential growth into the metaverse on mars I was promised? Seems like everything regarding AI has completely flatlined since GPT-4. I thought by now I'd have insane TTS AI but it's still the fricking same as it was when I last looked into it. What a disappointment it all has been... two more weeks sirs!
- 3 months ago
  
  Reply
  
  Anonymous
  
  >nor has open source fully caught up.
  Tardbro, StyleTTS and XTTSV2 exist.
  - 3 months ago
    
    Reply
    
    Anonymous
    
    care to share any examples? 11labs still mogs everything i've heard so far, it's not even close.
    https://vocaroo.com/1lU91UW8qjC2
    https://vocaroo.com/12QGwp4vk8O0
    https://vocaroo.com/1dtrp3RcdsTJ
3 months ago

Reply

Anonymous

Just make it so any AI work must pay the original artist 100% as if he/she did it, unless you sign some open-source shit for your voice/likeness/artwork
- 3 months ago
  
  Reply
  
  Anonymous
  
  This. Data shouldn't be allowed to be used to train models unless it's been specifically agreed to by the author for that purpose.
- 3 months ago
  
  Reply
  
  Anonymous
  
  Google is training their AI using billions of Youtube videos... good luck figuring out who does generic male voice #472 with neutral American accent.
  - 3 months ago
    
    Reply
    
    Anonymous
    
    good
  - 3 months ago
    
    Reply
    
    Anonymous
    
    I think laws are pretty slow on this regard, like, yes, you don't read TOS or shit when using any service and it probably says they can do whatever the frick they want with your videos but I would love to see a new law were companies need to notify if your video(s) has been used for AI training, even if you don't get paid for it.
    - 3 months ago
      
      Reply
      
      Anonymous
      
      Consent should be a requirement for training AI models and they should make models trained on non-consensual data illegal to possess.
      - 3 months ago
        
        Reply
        
        Anonymous
        
        If the information's private, like my private porn collection, sure. If it's public view, humans can experience and subsequently learn from this information freely, why shouldn't they be able to side-load the learning to AI?
        It's the use of this learning which requires regulation, again where it concerns others.
  - 3 months ago
    
    Reply
    
    Anonymous
    
    Force google to provide every single source of training data, or they are not protected in any case whatsoever legally from whatever anyone can make the Google AI do, if someone tricks googles AI to do copyright infringement, Google pays the fine or the employees who made the AI go to jail for mass distribution of copyrighted content
    This is the only way to force these anti-human companies to play nice
3 months ago

Reply

Anonymous

I paid for elevenlabs for a month and did the voice cloning off someone who did podcasts.

It didn't sound anything like him. I cancelled my subscription. The end.
3 months ago

Reply

Anonymous

love this meme where artists are realizing they aren't immune to the ever increasing tide of automation and finally think about all the other fields that got 0 support and news about being automated.
- 3 months ago
  
  Reply
  
  Anonymous
  
  >Spam memes about coding
  >Get replaced by code
  >TFW
- 3 months ago
  
  Reply
  
  Anonymous
  
  Artists aren't realizing shit. The only thing they understand is that they're under threat from automation, nothing more and nothing less. They implicitly lack the self awareness or empathy needed to compare themselves to others who've faced the same challenges in the past. They see themselves as a privileged class, and plebians outside that class deserve no sympathy in their eyes, for the plight of the plebs is nothing in comparison to the struggles of an auteur.
  tldr; they don't have the capacity to self reflect like you're implying they do.
3 months ago

Reply

Anonymous

I’m not gonna fedpost I’m not gonna fedpost I’m not gonna fedpost I’m not gonna fedpost I’m not gonna fedpost I’m not gonna fedpost I’m not gonna fedpost
3 months ago

Reply

Anonymous

The question is why, as a user, you wouldn't pirate all those books and pay for the service to dub every book for you? Why do users need middlemen to do everything for them?
- 3 months ago
  
  Reply
  
  Anonymous
  - 3 months ago
    
    Reply
    
    Anonymous
    
    I loathe how ridiculous this statement has become. For the price of a bigmac, fries and coke, i can cook a whole dinner.
  - 3 months ago
    
    Reply
    
    Anonymous
    
    I loathe how ridiculous this statement has become. For the price of a bigmac, fries and coke, i can cook a whole dinner.
    
    poorgay cope, if you have to cook your own meal you're still paying with your time which is cucked
    - 3 months ago
      
      Reply
      
      Anonymous
      
      >5min to make a sandwich is too much
      actual poorgay cope that is just punching down
      if you don't have free time because you're slaving away, you aren't rich as you gays like to pretend
      - 3 months ago
        
        Reply
        
        Anonymous
        
        please bro you have to work 150 hours per week or else you're poor i don't sleep for days on end thinking about work you wouldn't understand i cant even cook on the weekend because i have to take my wife to the sex club to try to find someone to frick her please they wouldnt even let you into the club because youre too broke its only for real professionals please i drink every single day but its not a proble, i have a job, i have JOB, you wouldnt get it
    - 3 months ago
      
      Reply
      
      Anonymous
      
      >he's paid an hourly wage
      lmfao absolute turbo mcdonalds worker
    - 3 months ago
      
      Reply
      
      Anonymous
      
      What a sad life you must live.
    - 3 months ago
      
      Reply
      
      Anonymous
      
      Given the amount of well off and nicely dressed people at any given McDonalds establishment you may be onto something.
    - 3 months ago
      
      Reply
      
      Anonymous
      
      i can spend 30min in the kitchen and make enough food to last a few days
      you don't need to spend all that time making just one serving, you know
3 months ago

Reply

Anonymous

>my friend whose dad works at nintendo blah blah blah bix nood muhfuggn shhiieeett
who writes this shit
3 months ago

Reply

Anonymous

Local versions are actively used to make degenerate porn voiceovers, asmr etc.

So you don't hear about them because they're no longer a tech demo, but do actual work.
- 3 months ago
  
  Reply
  
  Anonymous
  
  I was just going to ask about this. Images, chatbots, etc, all those have local versions even if they are smaller models and slower because most don't have industrial hardware etc. But I don't see any mention of local models, nor is there a thread like there is for aicg, lmg, stable diffusion...
  Elevenlabs is ok, but while it's cheaper than voicegays by far it would still cost a few hundred for one audio book. And it doesn't seem as if there's any emphasis or tonal shift so you'll get shit like the narration being read with some interest and then a very enthusiastic speech read in this depressive monotone wtf.
3 months ago

Reply

Anonymous

>waaah big daddy guberment please regulate le evil AI
What the frick is wrong with these people
- 3 months ago
  
  Reply
  
  Anonymous
  
  10 years of neverending mass hysteria, from one moral panic to the next, initiated via msm and then propagated on twitter and reddit
- 3 months ago
  
  Reply
  
  Anonymous
  
  >People
  That's the problem.
3 months ago

Reply

Anonymous

Liberating people from work is a good thing.
- 3 months ago
  
  Reply
  
  Anonymous
  
  Sure, but there's 0 reason to expect that to happen. Are you 12?
  - 3 months ago
    
    Reply
    
    Anonymous
    
    What do you mean? Productivity with less work required always improves the standard of living.
    - 3 months ago
      
      Reply
      
      Anonymous
      
      No? Quality of life for the average American has nosedived since the 70s which is when the computer revolution started. Suicide and mental illness is up, real wages are down, deaths of despair skyrocketing almost as high as corporate profits. The only real increase in QoL anyone got was WFH which israelite managers are now crying about and calling evil.
      - 3 months ago
        
        Reply
        
        Anonymous
        
        Productivity has decreased in a lot of ways since then. Think back to the late 1800s. People complained then about losing their jobs to machinery then, but their standard of living improved for it.
        
        3 months ago
        
        Reply
        
        Anonymous
        
        Absolutely not. Go look up what life was like for the urban industrial poor in 1800s England.
  - 3 months ago
    
    Reply
    
    Anonymous
    
    NTA, but there actually is a pretty good chance of that happening. The truth is, the working class really doesn't have much more by way of assets to drain. Most of us don't own our own homes, don't have any equity in retirement, no pensions and no savings. The vast majority. The capital has already been drained from the middle and lower classes by 50 years of wealth extraction and inflation without any wage increases.
    
    However, we are primarily useful to the system as consumers to buy goods and as debt holders, both of which are expansions in the market cap in places other than revenue. If people don't consume, there is no point to creating the goods/value in the first place, as its value remains only theoretical until it is sold. Likewise, additional debt can't exist within the system without additional people, so if you reduce the amount of people able to take on debts by killing off or enslaving all the poors, you're actually crunching billions of potential dollaridoos out of the economy.
    
    There's a very good chance that we're not going to be needed to work in the future, as the cost of labor and transportation is far outweighing the capital gained from sales. However, they do still need us to consume and to facilitate debt, so hello UBI.
    - 3 months ago
      
      Reply
      
      Anonymous
      
      I hope you're correct but I specifically disagree with:
      >If people don't consume, there is no point to creating the goods/value in the first place, as its value remains only theoretical until it is sold.
      It isn't necessarily true because the motivation to seek infinite profit is not inherent, profit and value are proxies for POWER which is what the sociopaths we've been funneling into leadership positions for the past 100+ years want.
3 months ago

Reply

Anonymous

>increase profits and decrease expenditure
>damage
?
3 months ago

Reply

Anonymous

>your fricking job is to read books
If your voice is worse than AI perhaps you don't deserve a paycheck.

Also if AI replaces a programmer (you) that programmer deserves to be jobless
3 months ago

Reply

Anonymous

The tech isn't quite there yet, but obviously it is going to happen within the next couple of years.
The story is likely fake, but the tech will soon be mature enough to replace all voice actors.

It also doesn't face the enormous challenge of natural language generation, where current tech can't comprehend relationships over e.g. the length of a novel. Text to Voice doesn't have that Problem, tone of voice depends on the couple surrounding sentences at most.
- 3 months ago
  
  Reply
  
  Anonymous
  
  Tone of voice still seems erratic. I've seen it be somewhat eager in narration and then become this flat monotone in what should be a deep passionate speech.
  It definitely has some epic fricking wins though even if it isn't on the main project I'm using it for.
  Be sure and drink something before clicking this shit:
  https://litter.catbox.moe/7a0ljm.mp3
  Voicegays eternally BTFO.
  >G0RMY
  - 3 months ago
    
    Reply
    
    Anonymous
    
    Lel.
    Yes, the tech isn't completely there yet, but it is obviously coming. Analyzing the needed tone seems a much easier problem than long distance attention. And at some point it will just be good enough to replace voice actors.
    - 3 months ago
      
      Reply
      
      Anonymous
      
      I've seen a different program/service that does let you control tone with tags. Does that work with elevenlabs or is it behind one of the paywalls? Because that's the only thing missing, shifting it so it's monotone when the narrator is bored, then curious/excited/angry/etc as appropriate.
      - 3 months ago
        
        Reply
        
        Anonymous
        
        100% coming. Also important for RPGs, so that writers can emphasize dialog.
        
        3 months ago
        
        Reply
        
        Anonymous
        
        But it's not there yet?
        It seems average-hilariously good with dialogue, but holy frick it can't sing, especially not on key. Even I could sing better than it did lmfao.
        Some other anon did a hilarious thing with Snake and wojaks, how the frick did he manage that?
        
        They'll look high and they'll look low,
        They'll look everywhere we go,
        But when the s()ycucks find us we won't hide!
        They'll come loud and they'll come fast,
        But we shoot first and we can last!
        Keep your waifu by your side!
3 months ago

Reply

Anonymous

I want to TTS smut and I don't care if it sounds like a specific celebrity or anything and I don't want to send my prompt to elevenlabs and I don't want the output to just be flatly articulated run-on monotone. Is there anything I can use yet?
- 3 months ago
  
  Reply
  
  Anonymous
  
  Just do elevenlabs, to this day nobody has been banned or even warned for prompting smut
  They're obviously aware it's happening and don't care
  - 3 months ago
    
    Reply
    
    Anonymous
    
    it's not so much that I don't want to get banned, it more that I don't want my prompts anywhere near a service that receives payments from any account I own. Sites get hacked all the time. Imagine if a database came out of everyone's prompts with their corresponding payment info. No thanks.
    - 3 months ago
      
      Reply
      
      Anonymous
      
      Your other option is RVC on collab but you need a source audio for it to dub over
      So you could always record the audio yourself with all the exact intonation you want but you might find recording and hearing that back too embarrassing a prospect to muster.
      Also I think they might have messed with the RVC collab thing and it might not work now, I'm not up to date
3 months ago

Reply

Anonymous

>get born
>get lucky and have some weird talent
>get even more lucky and meet people who hook you up with a job using that talent
>easy money glitch
>bullshit do nothing job
>AI comes out
>patches your exploit
>WAHHHHHHHHH WE NEED GOVERNEMTN TO DO SHIT IM DIFFERENT IM SPECIAL EVERYBODY NEEDS TO TURN INTO A LUDITE JUST SO I NEVER HAVE TO GET A REAL JOB

I mean who the frick actually feels sorry for these fricks? Nobody feels bad when 100,000 minimum wage factory workers lose their job. Nobody passes any fricking laws to stop it. Now a bunch of homosexuals with the incredibly niche job of reading books out loud are losing their job and all of a sudden technology is bad? Why are laws being considered to stop it? Are there even 100 of these homosexuals in the entire world? The utter disdain for working class peasants in this country is just sad. Nobody give a frick about them.
3 months ago

Reply

Anonymous

Learn to code, artgays
- 3 months ago
  
  Reply
  
  Anonymous
  
  >a future of all engineers
  horrifying
3 months ago

Reply

Anonymous

its not "damaging" an industry
its tech disrupt
3 months ago

Reply

Anonymous

Can AI voice synthesis do a roflcopter?
3 months ago

Reply

Anonymous

This ai has really drawn out the people doing this niche professions that nobody thought were making money.
- 3 months ago
  
  Reply
  
  Anonymous
  
  they also inserted snide remarks about gamergate instead of doing their jobs properly
  Frick these naggers 2bqh famalam
  - 3 months ago
    
    Reply
    
    Anonymous
    
    Gamergate is literally the reason all humans oppose them.
    They could have literally just shut their prostitute mouths and we'd have assumed a half dozen morons were being morons, but that was the end of it. The investigation only came because they acted like NPC swarms when questioned.
    That's also likely the reason why the NPC meme is common knowledge. Speaking of...
    
    https://i.imgur.com/onP8Awm.jpg
    
    But think of the so called NPCs so to speak it would be a great benefit to them
    
    Anon has a fricking point. While it can potentially be interesting and is worth doing just because it makes voicegays seethe, the primary audience of audiobooks is illiterate NPCs pretending they are "readers".
3 months ago

Reply

Anonymous

I regret not downloading as much AI voice clips as I could while elevenlabs was free for like 1-2 days. There was a simple website with white background that had a tutorial for AI voice cloning using elevenlabs, and it also had many samples including memes from disgaea and mitsuru from persona 3, did anyone bookmark that page? know the name? there has to be at least an archive right? frick.
- 3 months ago
  
  Reply
  
  Anonymous
  
  yes I made it
  https://rentry.org/aivoicestuff
  I was there for precucked chatgpt, the launch of stable diffusion, dalle, the novelai leak, all of it. still nothing was as fun as the elevenlabs voice threads. it was the perfect blend of anon creativity powered by ai. its a shame voice never took off really locally in the same way stablediffusion and llama did.
  - 3 months ago
    
    Reply
    
    Anonymous
    
    oh shit it's alive! thank you for your work. And yeah those ai voice threads were crazy, a lot of unfunny and "racist for luls" voices but nevertheless creative, and really good ones here and there. It was so good it had to be shut down sadly.
  - 3 months ago
    
    Reply
    
    Anonymous
    
    I just want to share that i'm grateful for the compilation. I just want a cute AI waifu to read naughty words to me or speak a sentence i prepared as if it was directed to me. I'll sign in Feb on my bday just for a month to try it out.
    Until we have open source, this is going to be it. Maybe 2024 will be the year of AI waifu voice.
3 months ago

Reply

Anonymous

imagine being so irrelevant as an artist that literally an algorithm replaces you. you were never gonna make it anyway
3 months ago

Reply

Anonymous

Total human death when?
3 months ago

Reply

Anonymous

it happened?
Everspace2 devs AI voicelined a frick ton of radio chatter lines they will never go bak.
They also used their own voices so these voice guild merchants can't even cry about it.
3 months ago

Reply

Anonymous

https://vocaroo.com/19BsaDqFJApI

Xtts2 + deepspeed works fine. Pretty fast and voice cloning like this working decent.
- 3 months ago
  
  Reply
  
  Anonymous
  
  https://github.com/daswer123/xtts-webui
  
  Extremely fast + voice cloning + fine tuning that even a child can do
  
  Extremely easy fine-tuning/training of the model with custom datasets. Just drop in the audio files and it generates all the text/transcription data sets using whisper and tunes the model.
- 3 months ago
  
  Reply
  
  Anonymous
  
  KEK.
  
  https://github.com/daswer123/xtts-webui
  
  Extremely fast + voice cloning + fine tuning that even a child can do
  
  Extremely easy fine-tuning/training of the model with custom datasets. Just drop in the audio files and it generates all the text/transcription data sets using whisper and tunes the model.
  
  Interesting. Have some more KEKs.
  https://litter.catbox.moe/u9l10g.mp3
3 months ago

Reply

Anonymous

Does anyone have a voice clip of the YWNBAW copypasta?
- 3 months ago
  
  Reply
  
  Anonymous
  
  >https://rentry.org/aivoicestuff
  >Tony Jay
  >Example: https://vocaroo.com/1eFZ8cSqmxcw
  There's one I found.
  - 3 months ago
    
    Reply
    
    Anonymous
    
    KEK. This is fairly good but I used my quota so I can't redo it in that funny angry voice.
    
    i can spend 30min in the kitchen and make enough food to last a few days
    you don't need to spend all that time making just one serving, you know
    
    I press buttons and wait out timers and several days work of food cooks while I sit comfy.
    - 3 months ago
      
      Reply
      
      Anonymous
      
      Took me 1 min to do the whole thing.
      
      1) downloaded the copy pasta clip
      2) put it under whisper for transcription
      3) voice clone with 1 min of bateman voice sample
      - 3 months ago
        
        Reply
        
        Anonymous
        
        The funny voice I'm using is called Patrick under elevenlabs but it's not Bateman, it sounds more like this:
        https://vocaroo.com/1a4VwWKF0XJh
        
        3 months ago
        
        Reply
        
        Anonymous
        
        Yeah, we need a new model with more expressive dataset. Then an AI that can detect/change emotions/tone of the voice. XTTS is great little loot for home usage, but its developed by a now defunct/bankrupt opensource company.
        
        3 months ago
        
        Anonymous
        
        Yeah, even elevenlabs can't modulate tone. Apparently some amazon thing can, but dunno how well it works.
        https://docs.aws.amazon.com/polly/latest/dg/supportedtags.html
        
        3 months ago
        
        Anonymous
        
        It should be doable. We know what angry tones are like. We know what happy/sad/neutral/etc tones are like. Train a large data set with those distinctions and implement slider control. You can easily get a understanding of tones through that. It doesn't have to be single tone for single emotion either, there could be various subset of tones to choose from organized by self similar data sets. Hell, we can even detect the tone of something just through the sentence/text alone, but thats preliminary textual prompt. But on the tts inference model data side, it should really doable.
3 months ago

Reply

Anonymous

They have no leverage with corporations or the government, and if they got the "worker's revolution" they wanted their asses would be sent to a collective farm.
3 months ago

Reply

Anonymous

>pic
yeah, just like they protected all those human translators
3 months ago

Reply

Anonymous

After the way they fricked the half life 2 dub for my language, I genuinely think this is the best course of actions for some studios. The dev gets to choose to either:
>Pay for the recording space
>Buy/loan the equipment
>Pay the voice actors
>Get the license for the audio editing software to stitch it all together
>Hire someone to work on the audio files(optional)
Or:
>Pay a subscription cloud service and produce audio files according to the specs.
It will not be a wonder when devs pick the second approach
3 months ago

Reply

Anonymous

I've heard a few samples from these AI voices, and they're impressive but they miss a lot nuance that makes a good narrator and tend to just sound like a school kid reciting the book to their class. No way can they compete (yet) with regular narrators, and there's absolutely no way they'll ever compete with actors that also narrate (Steven Fry, for example).
- 3 months ago
  
  Reply
  
  Anonymous
  
  >absolutely no way they'll ever compete with actors that also narrate
  I should screencap this for the moron folder to repost in 5 years
  - 3 months ago
    
    Reply
    
    Anonymous
    
    months*
    A year ago AI was Replika and censored character AI. Now you can get stuff that BTFOs all females and most males.
    
    It should be doable. We know what angry tones are like. We know what happy/sad/neutral/etc tones are like. Train a large data set with those distinctions and implement slider control. You can easily get a understanding of tones through that. It doesn't have to be single tone for single emotion either, there could be various subset of tones to choose from organized by self similar data sets. Hell, we can even detect the tone of something just through the sentence/text alone, but thats preliminary textual prompt. But on the tts inference model data side, it should really doable.
    
    At this point combining two voices I already found would get me close for what I need in a major project.
    Just something that sounds like that angry voice when smiting fools or laying down the law, and something deeper and calmer most other times but the same voice, not like this where you can clearly tell it's a different man.
    https://vocaroo.com/14mQe4XgTJHe
3 months ago

Reply

Anonymous

I'm on audible and a bunch of readers can't read for shit. They mispronounce words all the time and can't seem to understand contexts at all. They're just human printers. And I fricking hate printers.
3 months ago

Reply

Anonymous

>ecker went AWOL with his AIVC work
sadge

Cancel reply