Has anyone gotten Whisper AI to work on Vega cards?

Posted on March 6, 2023 by Anonymous

POSIWID: The Purpose Of A System Is What It Does Shirt $21.68

UFOs Are A Psyop Shirt $21.68

POSIWID: The Purpose Of A System Is What It Does Shirt $21.68

1 year ago

Reply

Anonymous

Didn't realize OAI had released an open source STT model

Have they made a TTS one too?
- 1 year ago
  
  Reply
  
  Anonymous
  
  Don't think so but the STT is really good plus it's not just a transcriber but a translator too. It BTFOs Youtube auto transcribe & translation at any rate.
  
  Found this BTW testing now to see if it works with Vega
  
  https://github.com/Const-me/Whisper
  - 1 year ago
    
    Reply
    
    Anonymous
    
    Confirmed this implementation works with Vega I can finally translate my german porn
    
    Fast as frick too using the largest model
  - 1 year ago
    
    Reply
    
    Anonymous
    
    >On my desktop computer with GeForce 1080Ti GPU, medium model, 3:24 min speech took 45 seconds to transcribe with PyTorch and CUDA, but only 19 seconds with my implementation
    >9.63 gigabytes runtime dependencies, versus 431 kilobytes Whisper.dll
    Bro what the frick, I've been using the openai release for months and today I find out that there was a different one much faster that also didn't need a million of files of dependencies.
    Thanks a lot for this link, I wish I had seen it sooner but it's much appreciated all the same.
    - 1 year ago
      
      Reply
      
      Anonymous
      
      Wait so this dude's fork doesn't need 10 gigs of dependencies, whereas the official one does?
      - 1 year ago
        
        Reply
        
        Anonymous
        
        Yeah, exactly.
        
        is it better than mozilla's open speech project?
        
        can it be used offline?
        
        It can be used offline, but you need to download one of the models first.
        https://huggingface.co/datasets/ggerganov/whisper.cpp
        
        1 year ago
        
        Reply
        
        Anonymous
        
        thanks I downloaded all of them to use their bandwidth
        
        1 year ago
        
        Anonymous
        
        >The original Whisper PyTorch models provided by OpenAI have been converted to custom ggml format in order to be able to load them in C/C++. The conversion has been performed using the convert-pt-to-ggml.py script. You can either obtain the original models and generate the ggml files yourself using the conversion script, or you can use the download-ggml-model.sh script to download the already converted models. Currently, they are hosted on the following locations:
        
        oh no looks like these are actually in a different format, guess I'll have to download them again using that script
  - 1 year ago
    
    Reply
    
    Anonymous
    
    is it better than mozilla's open speech project?
    
    can it be used offline?
  - 1 year ago
    
    Reply
    
    Anonymous
    
    >links to a wingay repository
    be gone
  - 1 year ago
    
    Reply
    
    Anonymous
    
    THANK YOU ANON
    This is so much better than installing all that python crap in my pc, thank you very much
  - 1 year ago
    
    Reply
    
    Anonymous
    
    Just beware, this implementation uses FP32, if you have a GPU with FP16 support it'll be able to run OpenAI's version faster
1 year ago

Reply

Anonymous

Bruh amsneed cards are a nightmare im never buying amd again
- 1 year ago
  
  Reply
  
  Anonymous
  
  Vega are compute monsters I'm not giving leather jacket man any of my shekels
- 1 year ago
  
  Reply
  
  Anonymous
  
  all cards are shit and have issues
  i bought a 4090 for ai shit and for stable diffusion its slower than old 2070s. clusterfrick between drivers and dependency versions, there's like a 400 comment thread on github; no one gives a frick
  
  ai researchers are just fricking garbage developers, thats the real problem here
1 year ago

Reply

Anonymous

Based. Can it work with AMD?
- 1 year ago
  
  Reply
  
  Anonymous
  
  Yeah it's really good. Here's an example
  
  Source clip in german
  
  Translation I did with the medium model
  
  https://pastebin.com/VQasR28X
  - 1 year ago
    
    Reply
    
    Anonymous
    
    the translation is wrong, it's spelled Ackhmut, not Bachumt
    - 1 year ago
      
      Reply
      
      Anonymous
      
      So run a parser that checks for city names?
  - 1 year ago
    
    Reply
    
    Anonymous
    
    >https://pastebin.com/VQasR28X
    holy shit it even gives regular SRT timestamp output? this is game changing finally can fix the moron subtitles that pirate autists make
  - 1 year ago
    
    Reply
    
    Anonymous
    
    really? I tried it on a 5700 xt and it spat out complete garbage. like actually random characters. it worked for me on cpu, for reference. is it an incompatibility with my card or did they fix the amd issue?
    - 1 year ago
      
      Reply
      
      Anonymous
      
      Try to use this implementation it should work on all cards
      
      https://github.com/Const-me/Whisper
      - 1 year ago
        
        Reply
        
        Anonymous
        
        thank-
        >windows
        it's over...I'm on linux
    - 1 year ago
      
      Reply
      
      Anonymous
      
      AI moron devs are usually clueless about anything that isn't CUDA. You generally need to rely on 3rd party devs for AMD support.
      
      what cpu do you have? [...] how much time does a 8gb vram gpu take to do 10mins of audio?
      
      It depends on how fast the card is obviously. For me my Vega 56 can do 10minutes in about 1:30 minutes
      - 1 year ago
        
        Reply
        
        Anonymous
        
        >AI moron devs are usually clueless about anything that isn't CUDA
        I was hoping that since it uses pytorch and pytorch has official rocm support that it would work correctly (just like stable diffusion)
        oh this just made me remember that since I couldn't get it working on my card before, I had found a c/c++ rewrite significantly faster on cpu compared to the openai python version
        https://github.com/ggerganov/whisper.cpp
1 year ago

Reply

Anonymous

For weebs Japanese accuracy looks pretty good
- 1 year ago
  
  Reply
  
  Anonymous
  
  why is it that good at spanish?
  - 1 year ago
    
    Reply
    
    Anonymous
    
    Has to do with quantity/quality of training data I guess.
  - 1 year ago
    
    Reply
    
    Anonymous
    
    Maybe spanish pronunciation is prone to less ambiguity and mistakes
- 1 year ago
  
  Reply
  
  Anonymous
  
  Can confirm, I've been using whisper to jerk off furiously to japanese ASMR tracks from DLSite. I've been using google translate to do the translation though, I found it gave better results than the built in one.
1 year ago

Reply

Anonymous

https://github.com/ggerganov/whisper.cpp
this shit runs on a fricking raspberry pi, who cares about the GPU
- 1 year ago
  
  Reply
  
  Anonymous
  
  So the required vram doesn't matter all? I don't understand.
  - 1 year ago
    
    Reply
    
    Anonymous
    
    that one is for CPU
    which does this shit just fine for 99% of purposes
    - 1 year ago
      
      Reply
      
      Anonymous
      
      and what is the one purpose that gpu is superior?
      - 1 year ago
        
        Reply
        
        Anonymous
        
        it can do it faster
        also leaves your cpu free to do other shit
        so larger models, a lot more shit you want to get processed in shorter time (movies etc)
- 1 year ago
  
  Reply
  
  Anonymous
  
  Yeah and it will only take a gorillion years to transcribe a 10 minute audio
  
  So the required vram doesn't matter all? I don't understand.
  
  that one is for CPU
  which does this shit just fine for 99% of purposes
  
  Never use CPU it's slow as shit GPU is orders of magnitude faster.
  
  The VRAM matters if you want to use the largest model. You need like 12GB for the largest one.
  - 1 year ago
    
    Reply
    
    Anonymous
    
    takes my cpu like 8 mins to do 10 mins of audio using the large model lol
    - 1 year ago
      
      Reply
      
      Anonymous
      
      what cpu do you have?
      
      Yeah and it will only take a gorillion years to transcribe a 10 minute audio
      
      [...]
      [...]
      Never use CPU it's slow as shit GPU is orders of magnitude faster.
      
      The VRAM matters if you want to use the largest model. You need like 12GB for the largest one.
      
      how much time does a 8gb vram gpu take to do 10mins of audio?
1 year ago

Reply

Anonymous

What AMD cards are recommended or should be avoided for using ROCm on Linux to run AI?
- 1 year ago
  
  Reply
  
  Anonymous
  
  I think you just want the biggest FP16 power any cards going back to Vega or Polaris should work fine.
  - 1 year ago
    
    Reply
    
    Anonymous
    
    You have to recompile ROCm to use Polaris though
1 year ago

Reply

Anonymous

Can you BOTeniuses fine tune this model to do phoneme transcription instead? I want to be able to practice my Japanese pronunciation, not have it shit out words it think I said.

Cancel reply