Has anyone gotten Whisper AI to work on Vega cards?

Has anyone gotten Whisper AI to work on Vega cards?

  1. 7 months ago
    Anonymous

    Didn't realize OAI had released an open source STT model

    Have they made a TTS one too?

    • 7 months ago
      Anonymous

      Don't think so but the STT is really good plus it's not just a transcriber but a translator too. It BTFOs Youtube auto transcribe & translation at any rate.

      Found this BTW testing now to see if it works with Vega

      https://github.com/Const-me/Whisper

      • 7 months ago
        Anonymous

        Confirmed this implementation works with Vega I can finally translate my german porn

        Fast as fuck too using the largest model

      • 7 months ago
        Anonymous

        >On my desktop computer with GeForce 1080Ti GPU, medium model, 3:24 min speech took 45 seconds to transcribe with PyTorch and CUDA, but only 19 seconds with my implementation
        >9.63 gigabytes runtime dependencies, versus 431 kilobytes Whisper.dll
        Bro what the fuck, I've been using the openai release for months and today I find out that there was a different one much faster that also didn't need a million of files of dependencies.
        Thanks a lot for this link, I wish I had seen it sooner but it's much appreciated all the same.

        • 7 months ago
          Anonymous

          Wait so this dude's fork doesn't need 10 gigs of dependencies, whereas the official one does?

          • 7 months ago
            Anonymous

            Yeah, exactly.

            is it better than mozilla's open speech project?

            can it be used offline?

            It can be used offline, but you need to download one of the models first.
            https://huggingface.co/datasets/ggerganov/whisper.cpp

            • 7 months ago
              Anonymous

              thanks I downloaded all of them to use their bandwidth

              • 7 months ago
                Anonymous

                >The original Whisper PyTorch models provided by OpenAI have been converted to custom ggml format in order to be able to load them in C/C++. The conversion has been performed using the convert-pt-to-ggml.py script. You can either obtain the original models and generate the ggml files yourself using the conversion script, or you can use the download-ggml-model.sh script to download the already converted models. Currently, they are hosted on the following locations:

                oh no looks like these are actually in a different format, guess I'll have to download them again using that script

      • 7 months ago
        Anonymous

        is it better than mozilla's open speech project?

        can it be used offline?

      • 7 months ago
        Anonymous

        >links to a wingay repository
        be gone

      • 7 months ago
        Anonymous

        THANK YOU ANON
        This is so much better than installing all that python crap in my pc, thank you very much

      • 7 months ago
        Anonymous

        Just beware, this implementation uses FP32, if you have a GPU with FP16 support it'll be able to run OpenAI's version faster

  2. 7 months ago
    Anonymous

    Bruh amsneed cards are a nightmare im never buying amd again

    • 7 months ago
      Anonymous

      Vega are compute monsters I'm not giving leather jacket man any of my shekels

    • 7 months ago
      Anonymous

      all cards are shit and have issues
      i bought a 4090 for ai shit and for stable diffusion its slower than old 2070s. clusterfuck between drivers and dependency versions, there's like a 400 comment thread on github; no one gives a fuck

      ai researchers are just fucking garbage developers, thats the real problem here

  3. 7 months ago
    Anonymous

    Based. Can it work with AMD?

    • 7 months ago
      Anonymous

      Yeah it's really good. Here's an example

      Source clip in german

      Translation I did with the medium model

      https://pastebin.com/VQasR28X

      • 7 months ago
        Anonymous

        the translation is wrong, it's spelled Ackhmut, not Bachumt

        • 7 months ago
          Anonymous

          So run a parser that checks for city names?

      • 7 months ago
        Anonymous

        >https://pastebin.com/VQasR28X
        holy shit it even gives regular SRT timestamp output? this is game changing finally can fix the retard subtitles that pirate autists make

      • 7 months ago
        Anonymous

        really? I tried it on a 5700 xt and it spat out complete garbage. like actually random characters. it worked for me on cpu, for reference. is it an incompatibility with my card or did they fix the amd issue?

        • 7 months ago
          Anonymous

          Try to use this implementation it should work on all cards

          https://github.com/Const-me/Whisper

          • 7 months ago
            Anonymous

            thank-
            >windows
            it's over...I'm on linux

        • 7 months ago
          Anonymous

          AI retard devs are usually clueless about anything that isn't CUDA. You generally need to rely on 3rd party devs for AMD support.

          what cpu do you have? [...] how much time does a 8gb vram gpu take to do 10mins of audio?

          It depends on how fast the card is obviously. For me my Vega 56 can do 10minutes in about 1:30 minutes

          • 7 months ago
            Anonymous

            >AI retard devs are usually clueless about anything that isn't CUDA
            I was hoping that since it uses pytorch and pytorch has official rocm support that it would work correctly (just like stable diffusion)
            oh this just made me remember that since I couldn't get it working on my card before, I had found a c/c++ rewrite significantly faster on cpu compared to the openai python version
            https://github.com/ggerganov/whisper.cpp

  4. 7 months ago
    Anonymous

    For weebs Japanese accuracy looks pretty good

    • 7 months ago
      Anonymous

      why is it that good at spanish?

      • 7 months ago
        Anonymous

        Has to do with quantity/quality of training data I guess.

      • 7 months ago
        Anonymous

        Maybe spanish pronunciation is prone to less ambiguity and mistakes

    • 7 months ago
      Anonymous

      Can confirm, I've been using whisper to masturbate furiously to japanese ASMR tracks from DLSite. I've been using google translate to do the translation though, I found it gave better results than the built in one.

  5. 7 months ago
    Anonymous

    https://github.com/ggerganov/whisper.cpp
    this shit runs on a fucking raspberry pi, who cares about the GPU

    • 7 months ago
      Anonymous

      So the required vram doesn't matter all? I don't understand.

      • 7 months ago
        Anonymous

        that one is for CPU
        which does this shit just fine for 99% of purposes

        • 7 months ago
          Anonymous

          and what is the one purpose that gpu is superior?

          • 7 months ago
            Anonymous

            it can do it faster
            also leaves your cpu free to do other shit
            so larger models, a lot more shit you want to get processed in shorter time (movies etc)

    • 7 months ago
      Anonymous

      Yeah and it will only take a gorillion years to transcribe a 10 minute audio

      So the required vram doesn't matter all? I don't understand.

      that one is for CPU
      which does this shit just fine for 99% of purposes

      Never use CPU it's slow as shit GPU is orders of magnitude faster.

      The VRAM matters if you want to use the largest model. You need like 12GB for the largest one.

      • 7 months ago
        Anonymous

        takes my cpu like 8 mins to do 10 mins of audio using the large model lol

        • 7 months ago
          Anonymous

          what cpu do you have?

          Yeah and it will only take a gorillion years to transcribe a 10 minute audio

          [...]
          [...]
          Never use CPU it's slow as shit GPU is orders of magnitude faster.

          The VRAM matters if you want to use the largest model. You need like 12GB for the largest one.

          how much time does a 8gb vram gpu take to do 10mins of audio?

  6. 7 months ago
    Anonymous

    What AMD cards are recommended or should be avoided for using ROCm on Linux to run AI?

    • 7 months ago
      Anonymous

      I think you just want the biggest FP16 power any cards going back to Vega or Polaris should work fine.

      • 7 months ago
        Anonymous

        You have to recompile ROCm to use Polaris though

  7. 7 months ago
    Anonymous

    Can you BOTeniuses fine tune this model to do phoneme transcription instead? I want to be able to practice my Japanese pronunciation, not have it shit out words it think I said.

Your email address will not be published. Required fields are marked *