Has anyone gotten Whisper AI to work on Vega cards?

Has anyone gotten Whisper AI to work on Vega cards?

ChatGPT Wizard Shirt $21.68

Beware Cat Shirt $21.68

ChatGPT Wizard Shirt $21.68

  1. 1 year ago
    Anonymous

    Didn't realize OAI had released an open source STT model

    Have they made a TTS one too?

    • 1 year ago
      Anonymous

      Don't think so but the STT is really good plus it's not just a transcriber but a translator too. It BTFOs Youtube auto transcribe & translation at any rate.

      Found this BTW testing now to see if it works with Vega

      https://github.com/Const-me/Whisper

      • 1 year ago
        Anonymous

        Confirmed this implementation works with Vega I can finally translate my german porn

        Fast as frick too using the largest model

      • 1 year ago
        Anonymous

        >On my desktop computer with GeForce 1080Ti GPU, medium model, 3:24 min speech took 45 seconds to transcribe with PyTorch and CUDA, but only 19 seconds with my implementation
        >9.63 gigabytes runtime dependencies, versus 431 kilobytes Whisper.dll
        Bro what the frick, I've been using the openai release for months and today I find out that there was a different one much faster that also didn't need a million of files of dependencies.
        Thanks a lot for this link, I wish I had seen it sooner but it's much appreciated all the same.

        • 1 year ago
          Anonymous

          Wait so this dude's fork doesn't need 10 gigs of dependencies, whereas the official one does?

          • 1 year ago
            Anonymous

            Yeah, exactly.

            is it better than mozilla's open speech project?

            can it be used offline?

            It can be used offline, but you need to download one of the models first.
            https://huggingface.co/datasets/ggerganov/whisper.cpp

            • 1 year ago
              Anonymous

              thanks I downloaded all of them to use their bandwidth

              • 1 year ago
                Anonymous

                >The original Whisper PyTorch models provided by OpenAI have been converted to custom ggml format in order to be able to load them in C/C++. The conversion has been performed using the convert-pt-to-ggml.py script. You can either obtain the original models and generate the ggml files yourself using the conversion script, or you can use the download-ggml-model.sh script to download the already converted models. Currently, they are hosted on the following locations:

                oh no looks like these are actually in a different format, guess I'll have to download them again using that script

      • 1 year ago
        Anonymous

        is it better than mozilla's open speech project?

        can it be used offline?

      • 1 year ago
        Anonymous

        >links to a wingay repository
        be gone

      • 1 year ago
        Anonymous

        THANK YOU ANON
        This is so much better than installing all that python crap in my pc, thank you very much

      • 1 year ago
        Anonymous

        Just beware, this implementation uses FP32, if you have a GPU with FP16 support it'll be able to run OpenAI's version faster

  2. 1 year ago
    Anonymous

    Bruh amsneed cards are a nightmare im never buying amd again

    • 1 year ago
      Anonymous

      Vega are compute monsters I'm not giving leather jacket man any of my shekels

    • 1 year ago
      Anonymous

      all cards are shit and have issues
      i bought a 4090 for ai shit and for stable diffusion its slower than old 2070s. clusterfrick between drivers and dependency versions, there's like a 400 comment thread on github; no one gives a frick

      ai researchers are just fricking garbage developers, thats the real problem here

  3. 1 year ago
    Anonymous

    Based. Can it work with AMD?

    • 1 year ago
      Anonymous

      Yeah it's really good. Here's an example

      Source clip in german

      Translation I did with the medium model

      https://pastebin.com/VQasR28X

      • 1 year ago
        Anonymous

        the translation is wrong, it's spelled Ackhmut, not Bachumt

        • 1 year ago
          Anonymous

          So run a parser that checks for city names?

      • 1 year ago
        Anonymous

        >https://pastebin.com/VQasR28X
        holy shit it even gives regular SRT timestamp output? this is game changing finally can fix the moron subtitles that pirate autists make

      • 1 year ago
        Anonymous

        really? I tried it on a 5700 xt and it spat out complete garbage. like actually random characters. it worked for me on cpu, for reference. is it an incompatibility with my card or did they fix the amd issue?

        • 1 year ago
          Anonymous

          Try to use this implementation it should work on all cards

          https://github.com/Const-me/Whisper

          • 1 year ago
            Anonymous

            thank-
            >windows
            it's over...I'm on linux

        • 1 year ago
          Anonymous

          AI moron devs are usually clueless about anything that isn't CUDA. You generally need to rely on 3rd party devs for AMD support.

          what cpu do you have? [...] how much time does a 8gb vram gpu take to do 10mins of audio?

          It depends on how fast the card is obviously. For me my Vega 56 can do 10minutes in about 1:30 minutes

          • 1 year ago
            Anonymous

            >AI moron devs are usually clueless about anything that isn't CUDA
            I was hoping that since it uses pytorch and pytorch has official rocm support that it would work correctly (just like stable diffusion)
            oh this just made me remember that since I couldn't get it working on my card before, I had found a c/c++ rewrite significantly faster on cpu compared to the openai python version
            https://github.com/ggerganov/whisper.cpp

  4. 1 year ago
    Anonymous

    For weebs Japanese accuracy looks pretty good

    • 1 year ago
      Anonymous

      why is it that good at spanish?

      • 1 year ago
        Anonymous

        Has to do with quantity/quality of training data I guess.

      • 1 year ago
        Anonymous

        Maybe spanish pronunciation is prone to less ambiguity and mistakes

    • 1 year ago
      Anonymous

      Can confirm, I've been using whisper to jerk off furiously to japanese ASMR tracks from DLSite. I've been using google translate to do the translation though, I found it gave better results than the built in one.

  5. 1 year ago
    Anonymous

    https://github.com/ggerganov/whisper.cpp
    this shit runs on a fricking raspberry pi, who cares about the GPU

    • 1 year ago
      Anonymous

      So the required vram doesn't matter all? I don't understand.

      • 1 year ago
        Anonymous

        that one is for CPU
        which does this shit just fine for 99% of purposes

        • 1 year ago
          Anonymous

          and what is the one purpose that gpu is superior?

          • 1 year ago
            Anonymous

            it can do it faster
            also leaves your cpu free to do other shit
            so larger models, a lot more shit you want to get processed in shorter time (movies etc)

    • 1 year ago
      Anonymous

      Yeah and it will only take a gorillion years to transcribe a 10 minute audio

      So the required vram doesn't matter all? I don't understand.

      that one is for CPU
      which does this shit just fine for 99% of purposes

      Never use CPU it's slow as shit GPU is orders of magnitude faster.

      The VRAM matters if you want to use the largest model. You need like 12GB for the largest one.

      • 1 year ago
        Anonymous

        takes my cpu like 8 mins to do 10 mins of audio using the large model lol

        • 1 year ago
          Anonymous

          what cpu do you have?

          Yeah and it will only take a gorillion years to transcribe a 10 minute audio

          [...]
          [...]
          Never use CPU it's slow as shit GPU is orders of magnitude faster.

          The VRAM matters if you want to use the largest model. You need like 12GB for the largest one.

          how much time does a 8gb vram gpu take to do 10mins of audio?

  6. 1 year ago
    Anonymous

    What AMD cards are recommended or should be avoided for using ROCm on Linux to run AI?

    • 1 year ago
      Anonymous

      I think you just want the biggest FP16 power any cards going back to Vega or Polaris should work fine.

      • 1 year ago
        Anonymous

        You have to recompile ROCm to use Polaris though

  7. 1 year ago
    Anonymous

    Can you BOTeniuses fine tune this model to do phoneme transcription instead? I want to be able to practice my Japanese pronunciation, not have it shit out words it think I said.

Leave a Reply to Anonymous Cancel reply

Your email address will not be published. Required fields are marked *