Why do AI chip startups make shitty NPUs that don't support floating point calculations? It's always

Posted on September 4, 2023 by Anonymous

Why do AI chip startups make shitty NPUs that don't support floating point calculations?
It's always
>hey just quantize your shit model and hope it works
Why is it so hard to find a NPU with large memory and float32 support and high power efficiency?

>hailo-8
>26 TOPs, 2.5W
>no floating point support
>no HBM, use ram from host you b***h
Is this really the best we've got in 2023? Why don't google sell TPUv4 cards?

Ape Out Shirt $21.68

Yakub: World's Greatest Dad Shirt $21.68

Ape Out Shirt $21.68

8 months ago

Reply

Anonymous

>AI hardware
cool more worthless bloat
8 months ago

Reply

Anonymous

What's the actual point of these shitty AI accelerators instead of just plugging in a GPU? Isn't even the shittiest gpu capable of more than one of these things?
- 8 months ago
  
  Reply
  
  Anonymous
  
  If it's so easy do it yourself turdstain
  
  Power consumption is possibly 1/10 of what a GPU would be. Very specific to application tho
  - 8 months ago
    
    Reply
    
    Anonymous
    
    >Power consumption is possibly 1/10 of what a GPU would be.
    Neat, so can i generate AI anime tiddies on one of these chips? Can I plug it into my laptop and PC?
    - 8 months ago
      
      Reply
      
      Anonymous
      
      They have a PCI-E interface so with drivers you should be able to use it. Whether OPs pic is suitable for SD is unknown
- 8 months ago
  
  Reply
  
  Anonymous
  
  One use case is IoT. For instance, things that do local facial or object recognition like drones or doorbells or farm security cameras.
8 months ago

Reply

Anonymous

f32 is overkill for ML. INT8 is all you need, possibly even just INT4.
- 8 months ago
  
  Reply
  
  Anonymous
  
  You can't successfully train a model with only int8
  - 8 months ago
    
    Reply
    
    Anonymous
    
    These aren't for training.
    - 8 months ago
      
      Reply
      
      Anonymous
      
      I know currently they are for inference only. But why don't people make ones that can be used for training?
      - 8 months ago
        
        Reply
        
        Anonymous
        
        You want to train an tf32 ai model on 2.5W device. Do you realize how silly that sounds?
        
        8 months ago
        
        Reply
        
        Anonymous
        
        Why not if I can buy 10 of these and use them all and enjoy the power efficiency?
        
        8 months ago
        
        Anonymous
        
        Why not a single pcie card then?
        
        8 months ago
        
        Anonymous
        
        I don't understand your point but ASUS does sell pcie TPUs
        
        8 months ago
        
        Anonymous
        
        I think that's what the 8 stands for. INT8
        
        Have you considered just using an entry level consumer gpu with fp32 tensor cores like an Intel Arc A380? It does about 2 TOPs FP32 in theory.
        >https://www.intel.com/content/www/us/en/docs/oneapi/optimization-guide-gpu/2023-2/intel-xe-gpu-architecture.html
        >https://www.techpowerup.com/gpu-specs/arc-a380.c3913
        
        8 months ago
        
        Anonymous
        
        Power efficiency would take a huge hit just from f32. Anyway, large models are bandwidth-bound, making these accelerators pointless without their own integrated RAM.
        
        8 months ago
        
        Anonymous
        
        >Power efficiency would take a huge hit just from f32.
        Surely they can do better than GPUs by optimizing for AI-only applications? Are you implying GPU is already power efficient? What makes you think that?
        
        >Anyway, large models are bandwidth-bound, making these accelerators pointless without their own integrated RAM.
        It doesn't make other DL applications moot, like training object detectors for example
        
        8 months ago
        
        Anonymous
        
        No, my point is you can't magically turn 26 TOPs of i8 to f32 at that wattage. Most optimistically, divide the TOPs by 4.
        
        An accelerator chip can do better than standard GPU FMA instructions for tensor ops, but Nvidia's tensor cores are already optimized for such (not that they necessarily give the same TOPs/W). Again, the most interesting part is data transfers- I'm almost certain the GAILO's 2.5W doesn't include the data transfers over PCI-E to drive the thing for 26 TOPs of useful computation. But I can't find figures for PCI-E transfer power consumption. If I'm right, the comparison is misleading in the first place, because GPU power consumption figures would include VRAM transfers.
        
        In general, GPUs have excellent memory bandwidth per $ and per W. Most useful computations end up bandwidth-bound. This is one reason specialized accelerators haven't exploded; you can't just specialize data transfer.
        
        8 months ago
        
        Anonymous
        
        What do you mean?
        Are TPU v4s non-existent? It's stated right in the OP.
        What are you on about?
8 months ago

Reply

Anonymous

What do these tiny ass NPUs even do that an off the shelf CPU or GPU can't do better? Is this the next iteration of the bullshit web framework designed to keep reddits employed? I see lots of hype and no practical applications.
8 months ago

Reply

Anonymous

>performance of a rtx 2050
hmm, thats actually really good for 2.5w, idk wtf you would possibly use this for. making a shitty carnival booth with AI images effects? its only ever going to run tiny models

if you could get a board with 20 of this in a single cluster then you'd be cooking with gas. Im with you OP idk why everyone is sleeping on LLM acceleration because thats why nvidia is making record profits. Kinda hard to get a slice of those profits when all everyone else makes is a glorified FX DSP
- 8 months ago
  
  Reply
  
  Anonymous
  
  >rtx2050
  what? that thing exists?
  - 8 months ago
    
    Reply
    
    Anonymous
    
    people buy laptops anon
8 months ago

Reply

Anonymous

Because they are AI edge devices designed to do inference quickly on small models which do not need floating point. They're for object detection, classifaction, etc...
Asus makes an expensive PCIe card full of Google Tensor units, if you want to throw away money on that, go for it.
- 8 months ago
  
  Reply
  
  Anonymous
  
  >They're for object detection, classifaction, etc...
  detecting what? classifying what?
  
  literally nobody has an edge use case for AI that needs more than 1-2TOPS, just look at how long snapchat has been doing it and with the shittiest of acceleration
  - 8 months ago
    
    Reply
    
    Anonymous
    
    >detecting what? classifying what?
    soldiers
8 months ago

Reply

Anonymous

floating point is bloat. uint8 is all you need
8 months ago

Reply

Anonymous

What's the most power efficient device I can use for messing around with deep learning (with training, of course)? No, I don't want to be tracked and use colab

Sorry, in 2023 unfortunately the anwser is none
8 months ago

Reply

Anonymous

int8 multiplication/addition takes up way less die space than fp32
8 months ago

Reply

Anonymous

>Why don't google sell TPUv4 cards?
I would assume its some national security thing they don't want other governments to have easy access to hardware that is capable of running AI models efficiently. I know gpus exist but those are power hogs not good for drones.

Cancel reply