Don't think so but the STT is really good plus it's not just a transcriber but a translator too. It BTFOs Youtube auto transcribe & translation at any rate.
Found this BTW testing now to see if it works with Vega
>On my desktop computer with GeForce 1080Ti GPU, medium model, 3:24 min speech took 45 seconds to transcribe with PyTorch and CUDA, but only 19 seconds with my implementation >9.63 gigabytes runtime dependencies, versus 431 kilobytes Whisper.dll
Bro what the fuck, I've been using the openai release for months and today I find out that there was a different one much faster that also didn't need a million of files of dependencies.
Thanks a lot for this link, I wish I had seen it sooner but it's much appreciated all the same.
thanks I downloaded all of them to use their bandwidth
4 weeks ago
Anonymous
>The original Whisper PyTorch models provided by OpenAI have been converted to custom ggml format in order to be able to load them in C/C++. The conversion has been performed using the convert-pt-to-ggml.py script. You can either obtain the original models and generate the ggml files yourself using the conversion script, or you can use the download-ggml-model.sh script to download the already converted models. Currently, they are hosted on the following locations:
oh no looks like these are actually in a different format, guess I'll have to download them again using that script
all cards are shit and have issues
i bought a 4090 for ai shit and for stable diffusion its slower than old 2070s. clusterfuck between drivers and dependency versions, there's like a 400 comment thread on github; no one gives a fuck
ai researchers are just fucking garbage developers, thats the real problem here
>https://pastebin.com/VQasR28X
holy shit it even gives regular SRT timestamp output? this is game changing finally can fix the retard subtitles that pirate autists make
really? I tried it on a 5700 xt and it spat out complete garbage. like actually random characters. it worked for me on cpu, for reference. is it an incompatibility with my card or did they fix the amd issue?
>AI retard devs are usually clueless about anything that isn't CUDA
I was hoping that since it uses pytorch and pytorch has official rocm support that it would work correctly (just like stable diffusion)
oh this just made me remember that since I couldn't get it working on my card before, I had found a c/c++ rewrite significantly faster on cpu compared to the openai python version
https://github.com/ggerganov/whisper.cpp
Can confirm, I've been using whisper to masturbate furiously to japanese ASMR tracks from DLSite. I've been using google translate to do the translation though, I found it gave better results than the built in one.
it can do it faster
also leaves your cpu free to do other shit
so larger models, a lot more shit you want to get processed in shorter time (movies etc)
Can you BOTeniuses fine tune this model to do phoneme transcription instead? I want to be able to practice my Japanese pronunciation, not have it shit out words it think I said.
Didn't realize OAI had released an open source STT model
Have they made a TTS one too?
Don't think so but the STT is really good plus it's not just a transcriber but a translator too. It BTFOs Youtube auto transcribe & translation at any rate.
Found this BTW testing now to see if it works with Vega
https://github.com/Const-me/Whisper
Confirmed this implementation works with Vega I can finally translate my german porn
Fast as fuck too using the largest model
>On my desktop computer with GeForce 1080Ti GPU, medium model, 3:24 min speech took 45 seconds to transcribe with PyTorch and CUDA, but only 19 seconds with my implementation
>9.63 gigabytes runtime dependencies, versus 431 kilobytes Whisper.dll
Bro what the fuck, I've been using the openai release for months and today I find out that there was a different one much faster that also didn't need a million of files of dependencies.
Thanks a lot for this link, I wish I had seen it sooner but it's much appreciated all the same.
Wait so this dude's fork doesn't need 10 gigs of dependencies, whereas the official one does?
Yeah, exactly.
It can be used offline, but you need to download one of the models first.
https://huggingface.co/datasets/ggerganov/whisper.cpp
thanks I downloaded all of them to use their bandwidth
>The original Whisper PyTorch models provided by OpenAI have been converted to custom ggml format in order to be able to load them in C/C++. The conversion has been performed using the convert-pt-to-ggml.py script. You can either obtain the original models and generate the ggml files yourself using the conversion script, or you can use the download-ggml-model.sh script to download the already converted models. Currently, they are hosted on the following locations:
oh no looks like these are actually in a different format, guess I'll have to download them again using that script
is it better than mozilla's open speech project?
can it be used offline?
>links to a wingay repository
be gone
THANK YOU ANON
This is so much better than installing all that python crap in my pc, thank you very much
Just beware, this implementation uses FP32, if you have a GPU with FP16 support it'll be able to run OpenAI's version faster
Bruh amsneed cards are a nightmare im never buying amd again
Vega are compute monsters I'm not giving leather jacket man any of my shekels
all cards are shit and have issues
i bought a 4090 for ai shit and for stable diffusion its slower than old 2070s. clusterfuck between drivers and dependency versions, there's like a 400 comment thread on github; no one gives a fuck
ai researchers are just fucking garbage developers, thats the real problem here
Based. Can it work with AMD?
Yeah it's really good. Here's an example
Source clip in german
Translation I did with the medium model
https://pastebin.com/VQasR28X
the translation is wrong, it's spelled Ackhmut, not Bachumt
So run a parser that checks for city names?
>https://pastebin.com/VQasR28X
holy shit it even gives regular SRT timestamp output? this is game changing finally can fix the retard subtitles that pirate autists make
really? I tried it on a 5700 xt and it spat out complete garbage. like actually random characters. it worked for me on cpu, for reference. is it an incompatibility with my card or did they fix the amd issue?
Try to use this implementation it should work on all cards
https://github.com/Const-me/Whisper
thank-
>windows
it's over...I'm on linux
AI retard devs are usually clueless about anything that isn't CUDA. You generally need to rely on 3rd party devs for AMD support.
It depends on how fast the card is obviously. For me my Vega 56 can do 10minutes in about 1:30 minutes
>AI retard devs are usually clueless about anything that isn't CUDA
I was hoping that since it uses pytorch and pytorch has official rocm support that it would work correctly (just like stable diffusion)
oh this just made me remember that since I couldn't get it working on my card before, I had found a c/c++ rewrite significantly faster on cpu compared to the openai python version
https://github.com/ggerganov/whisper.cpp
For weebs Japanese accuracy looks pretty good
why is it that good at spanish?
Has to do with quantity/quality of training data I guess.
Maybe spanish pronunciation is prone to less ambiguity and mistakes
Can confirm, I've been using whisper to masturbate furiously to japanese ASMR tracks from DLSite. I've been using google translate to do the translation though, I found it gave better results than the built in one.
https://github.com/ggerganov/whisper.cpp
this shit runs on a fucking raspberry pi, who cares about the GPU
So the required vram doesn't matter all? I don't understand.
that one is for CPU
which does this shit just fine for 99% of purposes
and what is the one purpose that gpu is superior?
it can do it faster
also leaves your cpu free to do other shit
so larger models, a lot more shit you want to get processed in shorter time (movies etc)
Yeah and it will only take a gorillion years to transcribe a 10 minute audio
Never use CPU it's slow as shit GPU is orders of magnitude faster.
The VRAM matters if you want to use the largest model. You need like 12GB for the largest one.
takes my cpu like 8 mins to do 10 mins of audio using the large model lol
what cpu do you have?
how much time does a 8gb vram gpu take to do 10mins of audio?
What AMD cards are recommended or should be avoided for using ROCm on Linux to run AI?
I think you just want the biggest FP16 power any cards going back to Vega or Polaris should work fine.
You have to recompile ROCm to use Polaris though
Can you BOTeniuses fine tune this model to do phoneme transcription instead? I want to be able to practice my Japanese pronunciation, not have it shit out words it think I said.