Why do AI chip startups make shitty NPUs that don't support floating point calculations?
It's always
>hey just quantize your shit model and hope it works
Why is it so hard to find a NPU with large memory and float32 support and high power efficiency?
>hailo-8
>26 TOPs, 2.5W
>no floating point support
>no HBM, use ram from host you bitch
Is this really the best we've got in 2023? Why don't google sell TPUv4 cards?
>AI hardware
cool more worthless bloat
What's the actual point of these shitty AI accelerators instead of just plugging in a GPU? Isn't even the shittiest gpu capable of more than one of these things?
If it's so easy do it yourself turdstain
Power consumption is possibly 1/10 of what a GPU would be. Very specific to application tho
>Power consumption is possibly 1/10 of what a GPU would be.
Neat, so can i generate AI anime tiddies on one of these chips? Can I plug it into my laptop and PC?
They have a PCI-E interface so with drivers you should be able to use it. Whether OPs pic is suitable for SD is unknown
One use case is IoT. For instance, things that do local facial or object recognition like drones or doorbells or farm security cameras.
f32 is overkill for ML. INT8 is all you need, possibly even just INT4.
You can't successfully train a model with only int8
These aren't for training.
I know currently they are for inference only. But why don't people make ones that can be used for training?
You want to train an tf32 ai model on 2.5W device. Do you realize how silly that sounds?
Why not if I can buy 10 of these and use them all and enjoy the power efficiency?
Why not a single pcie card then?
I don't understand your point but ASUS does sell pcie TPUs
I think that's what the 8 stands for. INT8
Have you considered just using an entry level consumer gpu with fp32 tensor cores like an Intel Arc A380? It does about 2 TOPs FP32 in theory.
>https://www.intel.com/content/www/us/en/docs/oneapi/optimization-guide-gpu/2023-2/intel-xe-gpu-architecture.html
>https://www.techpowerup.com/gpu-specs/arc-a380.c3913
Power efficiency would take a huge hit just from f32. Anyway, large models are bandwidth-bound, making these accelerators pointless without their own integrated RAM.
>Power efficiency would take a huge hit just from f32.
Surely they can do better than GPUs by optimizing for AI-only applications? Are you implying GPU is already power efficient? What makes you think that?
>Anyway, large models are bandwidth-bound, making these accelerators pointless without their own integrated RAM.
It doesn't make other DL applications moot, like training object detectors for example
No, my point is you can't magically turn 26 TOPs of i8 to f32 at that wattage. Most optimistically, divide the TOPs by 4.
An accelerator chip can do better than standard GPU FMA instructions for tensor ops, but Nvidia's tensor cores are already optimized for such (not that they necessarily give the same TOPs/W). Again, the most interesting part is data transfers- I'm almost certain the GAILO's 2.5W doesn't include the data transfers over PCI-E to drive the thing for 26 TOPs of useful computation. But I can't find figures for PCI-E transfer power consumption. If I'm right, the comparison is misleading in the first place, because GPU power consumption figures would include VRAM transfers.
In general, GPUs have excellent memory bandwidth per $ and per W. Most useful computations end up bandwidth-bound. This is one reason specialized accelerators haven't exploded; you can't just specialize data transfer.
What do you mean?
Are TPU v4s non-existent? It's stated right in the OP.
What are you on about?
What do these tiny ass NPUs even do that an off the shelf CPU or GPU can't do better? Is this the next iteration of the bullshit web framework designed to keep reddits employed? I see lots of hype and no practical applications.
>performance of a rtx 2050
hmm, thats actually really good for 2.5w, idk wtf you would possibly use this for. making a shitty carnival booth with AI images effects? its only ever going to run tiny models
if you could get a board with 20 of this in a single cluster then you'd be cooking with gas. Im with you OP idk why everyone is sleeping on LLM acceleration because thats why nvidia is making record profits. Kinda hard to get a slice of those profits when all everyone else makes is a glorified FX DSP
>rtx2050
what? that thing exists?
people buy laptops anon
Because they are AI edge devices designed to do inference quickly on small models which do not need floating point. They're for object detection, classifaction, etc...
Asus makes an expensive PCIe card full of Google Tensor units, if you want to throw away money on that, go for it.
>They're for object detection, classifaction, etc...
detecting what? classifying what?
literally nobody has an edge use case for AI that needs more than 1-2TOPS, just look at how long snapchat has been doing it and with the shittiest of acceleration
>detecting what? classifying what?
soldiers
floating point is bloat. uint8 is all you need
What's the most power efficient device I can use for messing around with deep learning (with training, of course)? No, I don't want to be tracked and use colab
Sorry, in 2023 unfortunately the anwser is none
int8 multiplication/addition takes up way less die space than fp32
>Why don't google sell TPUv4 cards?
I would assume its some national security thing they don't want other governments to have easy access to hardware that is capable of running AI models efficiently. I know gpus exist but those are power hogs not good for drones.