Why do AI chip startups make shitty NPUs that don't support floating point calculations? It's always

Why do AI chip startups make shitty NPUs that don't support floating point calculations?
It's always
>hey just quantize your shit model and hope it works
Why is it so hard to find a NPU with large memory and float32 support and high power efficiency?

>hailo-8
>26 TOPs, 2.5W
>no floating point support
>no HBM, use ram from host you b***h
Is this really the best we've got in 2023? Why don't google sell TPUv4 cards?

The Kind of Tired That Sleep Won’t Fix Shirt $21.68

Tip Your Landlord Shirt $21.68

The Kind of Tired That Sleep Won’t Fix Shirt $21.68

  1. 8 months ago
    Anonymous

    >AI hardware
    cool more worthless bloat

  2. 8 months ago
    Anonymous

    What's the actual point of these shitty AI accelerators instead of just plugging in a GPU? Isn't even the shittiest gpu capable of more than one of these things?

    • 8 months ago
      Anonymous

      If it's so easy do it yourself turdstain

      Power consumption is possibly 1/10 of what a GPU would be. Very specific to application tho

      • 8 months ago
        Anonymous

        >Power consumption is possibly 1/10 of what a GPU would be.
        Neat, so can i generate AI anime tiddies on one of these chips? Can I plug it into my laptop and PC?

        • 8 months ago
          Anonymous

          They have a PCI-E interface so with drivers you should be able to use it. Whether OPs pic is suitable for SD is unknown

    • 8 months ago
      Anonymous

      One use case is IoT. For instance, things that do local facial or object recognition like drones or doorbells or farm security cameras.

  3. 8 months ago
    Anonymous

    f32 is overkill for ML. INT8 is all you need, possibly even just INT4.

    • 8 months ago
      Anonymous

      You can't successfully train a model with only int8

      • 8 months ago
        Anonymous

        These aren't for training.

        • 8 months ago
          Anonymous

          I know currently they are for inference only. But why don't people make ones that can be used for training?

          • 8 months ago
            Anonymous

            You want to train an tf32 ai model on 2.5W device. Do you realize how silly that sounds?

            • 8 months ago
              Anonymous

              Why not if I can buy 10 of these and use them all and enjoy the power efficiency?

              • 8 months ago
                Anonymous

                Why not a single pcie card then?

              • 8 months ago
                Anonymous

                I don't understand your point but ASUS does sell pcie TPUs

              • 8 months ago
                Anonymous

                I think that's what the 8 stands for. INT8

                Have you considered just using an entry level consumer gpu with fp32 tensor cores like an Intel Arc A380? It does about 2 TOPs FP32 in theory.
                >https://www.intel.com/content/www/us/en/docs/oneapi/optimization-guide-gpu/2023-2/intel-xe-gpu-architecture.html
                >https://www.techpowerup.com/gpu-specs/arc-a380.c3913

              • 8 months ago
                Anonymous

                Power efficiency would take a huge hit just from f32. Anyway, large models are bandwidth-bound, making these accelerators pointless without their own integrated RAM.

              • 8 months ago
                Anonymous

                >Power efficiency would take a huge hit just from f32.
                Surely they can do better than GPUs by optimizing for AI-only applications? Are you implying GPU is already power efficient? What makes you think that?

                >Anyway, large models are bandwidth-bound, making these accelerators pointless without their own integrated RAM.
                It doesn't make other DL applications moot, like training object detectors for example

              • 8 months ago
                Anonymous

                No, my point is you can't magically turn 26 TOPs of i8 to f32 at that wattage. Most optimistically, divide the TOPs by 4.

                An accelerator chip can do better than standard GPU FMA instructions for tensor ops, but Nvidia's tensor cores are already optimized for such (not that they necessarily give the same TOPs/W). Again, the most interesting part is data transfers- I'm almost certain the GAILO's 2.5W doesn't include the data transfers over PCI-E to drive the thing for 26 TOPs of useful computation. But I can't find figures for PCI-E transfer power consumption. If I'm right, the comparison is misleading in the first place, because GPU power consumption figures would include VRAM transfers.

                In general, GPUs have excellent memory bandwidth per $ and per W. Most useful computations end up bandwidth-bound. This is one reason specialized accelerators haven't exploded; you can't just specialize data transfer.

              • 8 months ago
                Anonymous

                What do you mean?
                Are TPU v4s non-existent? It's stated right in the OP.
                What are you on about?

  4. 8 months ago
    Anonymous

    What do these tiny ass NPUs even do that an off the shelf CPU or GPU can't do better? Is this the next iteration of the bullshit web framework designed to keep reddits employed? I see lots of hype and no practical applications.

  5. 8 months ago
    Anonymous

    >performance of a rtx 2050
    hmm, thats actually really good for 2.5w, idk wtf you would possibly use this for. making a shitty carnival booth with AI images effects? its only ever going to run tiny models

    if you could get a board with 20 of this in a single cluster then you'd be cooking with gas. Im with you OP idk why everyone is sleeping on LLM acceleration because thats why nvidia is making record profits. Kinda hard to get a slice of those profits when all everyone else makes is a glorified FX DSP

    • 8 months ago
      Anonymous

      >rtx2050
      what? that thing exists?

      • 8 months ago
        Anonymous

        people buy laptops anon

  6. 8 months ago
    Anonymous

    Because they are AI edge devices designed to do inference quickly on small models which do not need floating point. They're for object detection, classifaction, etc...
    Asus makes an expensive PCIe card full of Google Tensor units, if you want to throw away money on that, go for it.

    • 8 months ago
      Anonymous

      >They're for object detection, classifaction, etc...
      detecting what? classifying what?

      literally nobody has an edge use case for AI that needs more than 1-2TOPS, just look at how long snapchat has been doing it and with the shittiest of acceleration

      • 8 months ago
        Anonymous

        >detecting what? classifying what?
        soldiers

  7. 8 months ago
    Anonymous

    floating point is bloat. uint8 is all you need

  8. 8 months ago
    Anonymous

    What's the most power efficient device I can use for messing around with deep learning (with training, of course)? No, I don't want to be tracked and use colab

    Sorry, in 2023 unfortunately the anwser is none

  9. 8 months ago
    Anonymous

    int8 multiplication/addition takes up way less die space than fp32

  10. 8 months ago
    Anonymous

    >Why don't google sell TPUv4 cards?
    I would assume its some national security thing they don't want other governments to have easy access to hardware that is capable of running AI models efficiently. I know gpus exist but those are power hogs not good for drones.

Your email address will not be published. Required fields are marked *