Its understanding of grammar is impressive, I haven’t seen any other transcriber on that level.
Funny how its most noticeable mistake is a pretty easy one (ain’t no such thing as halfway crooks), but it nails the last part where Eminem is speaking quickly without music. Probably because of the crowd’s noise.
>pops
pac. >but all six of you jumped
by all six of you chumps >we did fuck my girl
wink did fuck my girl >this hot plate groups
as halfway crooks. >fuck a pop, a doc
fuck papa doc
How difficult would it be to write a program to use this for speech to text? I'm a victim of US schools and never learned how to spell. To this day I sometimes need to use Google voice typing for some words and have been looking a first programming project.
>and have been looking a first programming project >looking a first
Less effort to learn to spell (and some grammar apparently).
Erm, can I translate, let me try:
You finna speak first fool, then spell `my bro`.
>Less effort to learn to spell
I have been told I my entire life I can't. I have been looking for a way to improve my spelling for a long time and the best I can do is fly-spell in Emacs. I will still misspell some words so badly I end up using my (non-free) phone to spell it for me.
In the past, I was dependent on Dragon Naturally Specking but now I only need my phone or search engine about every other sentence. Most normal gays think this is an acceptable way to live.
Very excited for this. I have some archived videos and always forget where I heard something from. With this can probably create a transcipt for each video next to it, and then index over all of them to create something searchable.
bretty neat, could be really cool to have autogenerated subtitles on every video/lyrics on all music/etc. because of a simple addition/plugin to various websites and programs.
Could it be used to autogenerate literal subtitles for anime? >generate japanese subtitles >substitute each japanese word with the closest english equivalent or an AI generated glyph
Pretty cool.
Its understanding of grammar is impressive, I haven’t seen any other transcriber on that level.
Funny how its most noticeable mistake is a pretty easy one (ain’t no such thing as halfway crooks), but it nails the last part where Eminem is speaking quickly without music. Probably because of the crowd’s noise.
where does it store the downloaded models? Linux
oh it's ~/.cache/whisper
Is there some place I can download the models and put on a flash drive? My internet is not good at home.
You can create a docker image, save it (docker save), and load it (docker load) whenever you want.
it's not "pops" though. it's pac.
>pops
pac.
>but all six of you jumped
by all six of you chumps
>we did fuck my girl
wink did fuck my girl
>this hot plate groups
as halfway crooks.
>fuck a pop, a doc
fuck papa doc
Model where
parents doesn't rhyme with marriage
Is this another of these shits I can't run if I don't have dedicated gaymer GPU?
You can run with CPU but it's slower
THANK GOD, god bless you anon
How difficult would it be to write a program to use this for speech to text? I'm a victim of US schools and never learned how to spell. To this day I sometimes need to use Google voice typing for some words and have been looking a first programming project.
>and have been looking a first programming project
>looking a first
Less effort to learn to spell (and some grammar apparently).
Erm, can I translate, let me try:
You finna speak first fool, then spell `my bro`.
you're a loser for having the time to respond like this
>Less effort to learn to spell
I have been told I my entire life I can't. I have been looking for a way to improve my spelling for a long time and the best I can do is fly-spell in Emacs. I will still misspell some words so badly I end up using my (non-free) phone to spell it for me.
In the past, I was dependent on Dragon Naturally Specking but now I only need my phone or search engine about every other sentence. Most normal gays think this is an acceptable way to live.
Very excited for this. I have some archived videos and always forget where I heard something from. With this can probably create a transcipt for each video next to it, and then index over all of them to create something searchable.
It's pretty good from what I've tested
bretty neat, could be really cool to have autogenerated subtitles on every video/lyrics on all music/etc. because of a simple addition/plugin to various websites and programs.
there is gpu cuda with is way faster. Too bad it can only handle up to medium on my 3070
Does Japanese ok too
OP should test Whisper on those fast speaking commercials you see on TV or hear on radio. What about livestock auctioneers?
I tried it with this:
.. does it work on porn?
anything that has people speaking, yeah
i have several terabytes of porn to run this on now
4090 it is i think
Don't work on riscv
how can we know if it wasn't trained on that specific song
Could it be used to autogenerate literal subtitles for anime?
>generate japanese subtitles
>substitute each japanese word with the closest english equivalent or an AI generated glyph
yes, "--task translate" gives you a subtitles file
Yes it already can do that. It doesn't work like that though.