Implementing machine learning in C

time to implement machine learning in c

The Kind of Tired That Sleep Won’t Fix Shirt $21.68

Yakub: World's Greatest Dad Shirt $21.68

The Kind of Tired That Sleep Won’t Fix Shirt $21.68

  1. 6 months ago
    Anonymous

    no

  2. 6 months ago
    Anonymous

    I tried doing it one night after drinking but i couldn't figure out error calculation and backtracking so i gave up and went to sleep

    • 6 months ago
      Anonymous

      Read a book that will teach you those things, and explain the meaning of them.

    • 6 months ago
      Anonymous

      Read a book that will teach you those things, and explain the meaning of them.

      >Read a book that will teach you those things, and explain the meaning of them.
      I don't really have a background in math so those books make about as much sense to me as Latin
      I am an engineer though, so i do a lot of Excel and numpy/pandas related stuff, it's just theoretical math that i have trouble with. If someone could show me a guide or video showing 2 hidden layers being backtracked with actual formulas and numbers, I'll be able to copy that.

      • 6 months ago
        Anonymous

        >I don't really have a background in math
        >I am an engineer though
        Pardon?

        • 6 months ago
          Anonymous

          >Pardon?
          Anon.... It may come as a surprise to you but even though we EEs study Maxwell equations, advanced calculus and other mad math topics, we don't actually understand or use that stuff very often.
          The most i use for my job is maybe highschool calculus and algebra sometimes univ level trigonometry.

          • 6 months ago
            Anonymous

            Not even Fourier transforms? In any case check out this book, I'm pretty certain it was written by an electrical engineer (going from memory). I don't think the math requirements should include anything you haven't seen. Taking a quick flick through the pages, the most "advanced" thing I saw was a gradient.

      • 6 months ago
        Anonymous
      • 6 months ago
        Anonymous

        >engineer
        >doesnt know math
        Choose one!

    • 6 months ago
      Anonymous

      To start of with the chain rule of derivatives is important, it means that if we want to find the gradient of a weight with respect to the loss we get at the end of a forward pass, we can find it by multiplying the gradients in between, for example in the image (DL/DW = DL/DA * DA/DS * DS/DW)
      this means that we can step backwards calculating each gradient as we go.

      to find the gradient of the weight we can start by finding the gradient of the Loss with respect to the Activation(s) (for MSE loss) is (2/N) *(Target-Activation). Where N is the number of output units
      The Gradient of the unit input to the activation (for RELU) is: IF x > 0 : 1, ELSE 0
      The gradient of the Weight and Unit Input is Activation(prev) where Activation(prev) is either the output of the unit at the start of the weight, or in this case the X input.

      putting this together we can find the gradient of the weight WRT the Loss.
      DL/DW = DL/DA * DA/DS * DS/DW
      DL/DW = 2*(Target-Activation) * 1 * X
      DL/DW = 2*(0.6-0.25) * 1 * 0.5 = 0.35
      Then you can adjust the weights using Weight = Weight - LearningRate * DL/DW

      outside of trivial networks like this the DL/DA is actually: SUM over J (DL/DInput * Weight(This -> J). Where J is the set of units that take in the output of the activation.

      If you dont understand anything. :shrug:

      • 6 months ago
        Anonymous

        >ReLU
        Into the trash it goes

        • 6 months ago
          Anonymous

          then use another activation. doesn't change anything.

        • 6 months ago
          Anonymous

          its a mongo example, i cant be assed writing the derivative of anything more complicated RN.
          also, I LOVE SPARSE GRADIENTS.

          https://i.imgur.com/b4jUbHx.png

          To start of with the chain rule of derivatives is important, it means that if we want to find the gradient of a weight with respect to the loss we get at the end of a forward pass, we can find it by multiplying the gradients in between, for example in the image (DL/DW = DL/DA * DA/DS * DS/DW)
          this means that we can step backwards calculating each gradient as we go.

          to find the gradient of the weight we can start by finding the gradient of the Loss with respect to the Activation(s) (for MSE loss) is (2/N) *(Target-Activation). Where N is the number of output units
          The Gradient of the unit input to the activation (for RELU) is: IF x > 0 : 1, ELSE 0
          The gradient of the Weight and Unit Input is Activation(prev) where Activation(prev) is either the output of the unit at the start of the weight, or in this case the X input.

          putting this together we can find the gradient of the weight WRT the Loss.
          DL/DW = DL/DA * DA/DS * DS/DW
          DL/DW = 2*(Target-Activation) * 1 * X
          DL/DW = 2*(0.6-0.25) * 1 * 0.5 = 0.35
          Then you can adjust the weights using Weight = Weight - LearningRate * DL/DW

          outside of trivial networks like this the DL/DA is actually: SUM over J (DL/DInput * Weight(This -> J). Where J is the set of units that take in the output of the activation.

          If you dont understand anything. :shrug:

          I've attached a less schitzo and more readable version of the equations for backprop in general.

      • 6 months ago
        Anonymous

        Thanks for the explanation

    • 6 months ago
      Anonymous

      >backtracking
      backpropagation

    • 6 months ago
      Anonymous

      >backpropogation
      Just use a genetic algorithm lol. Its only like 3 times slower than gradient descent backprop but at least does not get stuck in a local minimum all the time. If you do some architecture frickery you can even prevent overfitting

      • 6 months ago
        Anonymous

        >does not get stuck in a local minimum all the time
        Hyperparameters issue

        • 6 months ago
          Anonymous

          theres no gaurantee you reach the global minimum given any hyperparameter

          • 6 months ago
            Anonymous

            Yes and?

            • 6 months ago
              Anonymous

              which means you stuck at local minimum all the time

              • 6 months ago
                Anonymous

                >which means you stuck at local minimum all the time
                Not reaching the global minimum with a given set of hyperparameters doesn't mean you're never reaching it with any set of hyperparameters.

              • 6 months ago
                Anonymous

                then it would be a luck issue, not hyperparameter

      • 6 months ago
        Anonymous

        SGD is theortically and practically superior. It turns out SGD is the best approximation algorithm for ERM learning which is NP-hard.

  3. 6 months ago
    Anonymous

    not hard, like 20 lines of code, been done 40 years ago. and aside from some attention layer extras and moving buffers to the GPU, it's more or less still the same thing powering LLMs today

  4. 6 months ago
    Anonymous

    >makes it run at kernel level
    What happens?

    • 6 months ago
      Anonymous

      Judgement day.

  5. 6 months ago
    Anonymous

    the final redpill is to do a forward pass, calculate the error, then pick a random neuron and change it to see if error goes down or not. repeat until you reach the desired performance

    • 6 months ago
      Anonymous

      you just went full moron

  6. 6 months ago
    Anonymous

    99% of the effort is in autograd. Good luck.

  7. 6 months ago
    Anonymous

    Would there be massive improvement in training efficiency if training LLMs are done in C as opposed to python which is what is commonly done in the research realm?

    • 6 months ago
      Anonymous

      no, the training algorithms are already written in cuda. python is just a wrapper for it all

    • 6 months ago
      Anonymous

      Not really, the computationally costly part are already done in C
      If you want a easier way to speed up common code write the algos in Julia

      • 6 months ago
        Anonymous

        >Not really, the computationally costly part are already done in C
        lies.
        C has no auto-vectorization and is too slow.
        for linear algebra, the computationally costly stuff, you would use BLAS in C. BLAS is written in Fortran which actually has auto-vectorization.

        • 6 months ago
          Anonymous

          >BLAS is written in Fortran
          *was. It's all C++ now. Boomers who know Fortran are dying faster than we can replace them with C++ fact. We will rewrite BLAS in Rust in the next few decades at this rate.

          • 6 months ago
            Anonymous

            Except rust is dying already.

            • 6 months ago
              Anonymous

              Microsoft has literally given Rust billions of dollars.

              • 6 months ago
                Anonymous

                Wait what, why?

              • 6 months ago
                Anonymous

                llama doesnt use relu
                history, relu was not used because it didn't model a biological neutron correctly. alexnet said that s relu allowed model training to be several times more efficient, so relu was used every since then.
                apparently the newest networks don't even use activation functions at all.

              • 6 months ago
                Anonymous

                unless they've got something really fricky they still need an activation function of some sort to prevent linearity.
                Obvs i dont have my ear to the ground on that type of NN's but maybe its just a different type of ML

              • 6 months ago
                Anonymous

                yeah it sounded weird when i read it. they are called state space models. they sicked until last week until a paper by a single author improved them with a few obvious-in-hindsight tricks to be a gajillion percent better than transformers... at like just a few partners, no one has scanned them up yet, at least not the newer variants.

              • 6 months ago
                Anonymous

                holy cow, swype typig is quite inaccurate

              • 6 months ago
                Anonymous

                hey man, stop drinking beer, ok? that's a good boy.

              • 6 months ago
                Anonymous

                nta and it wasn't billions it was millions. The reason is that the Microsoft CTO is a Rust fanboy, also all 70% of their security vulnerabilities are caused by bugs that are not possible in Rust

          • 6 months ago
            Anonymous

            there's no way you would write GEMM in C or C++. it's either Fortran or hand-tuned assembly.

    • 6 months ago
      Anonymous

      >he thinks people haven't optimized AI in every possible way

      >yeah my random saturday idea would help the world advance.
      >why is the world so dumb and I'm so good

      • 6 months ago
        Anonymous

        He's just asking a question you Black person, no need to create a fake scenario in your head

        • 6 months ago
          Anonymous

          asking an obviously stupid question is called baiting

    • 6 months ago
      Anonymous

      Black person, the heavy lifting is done in C++, the king of performance. Python is just the interface. If you did them in C it would literally be a downgrade.

  8. 6 months ago
    Anonymous

    I wrote a neural network without a framework in dartlang in undergrad and php right out of college.

  9. 6 months ago
    Anonymous

    What cool project can you do with only a little neural network?
    Like if you do it in C using only a few gigs of ram and no GPU.

    • 6 months ago
      Anonymous

      bert, but for things other than language
      of course then you have to use its output vector for something

      • 6 months ago
        Anonymous

        >bert, but for things other than language
        like what? can you even use a language model for other than language generation stuff?

        • 6 months ago
          Anonymous

          anything sequential, not not time-series... hmm... how about clarifying Internet sequence? Algorithms for calculating the nth digit of Pi are extremely common and very diverse. Can bert auto-learn this? What about e? if you feed 10 digits of pi to bert, will it detect that the sequence is pi-like or e-like?

          if you look up recent state space model papers, those models have modes where they can be scaled to bahave sequence-favoring (transformers, quadratic comparison complexity) to series-favoring (special ssm sauce, either linear or log complexity). you can imagine making a tiny bert with state space models configured to be transformer-like and then tweak it from there.

          as an aside, berts do not normally do language generation, but recently they have been popular for that purpose. normally they are envisioned as encoders if text or sentences.

          i wonder why no one embeds berts in gpts to do coarse decision making.

          • 6 months ago
            Anonymous

            frick, damnit, why does swype typing never work! okay anon i hope you can decrypt what i just wrote....

  10. 6 months ago
    Anonymous

    Programming could have been so good if only the right systems, paradigms and concepts were implemented earlier on.

    The speed of programming could have been 100x faster than the current pace, I am so upset that every single person has got it completely wrong. Far out, far out, we've done everything wrong. If you could see what I can see in my own programs...

    • 6 months ago
      Anonymous

      Please do tell anon, what programming techniques and parardigms have you been using?

      • 6 months ago
        Anonymous

        rust obviously

    • 6 months ago
      Anonymous

      why don't you opine about x86 while you're at it

      • 6 months ago
        Anonymous

        >he isn't writing ISO C
        why does anyone get even mad about this. duh you can't afford the spec and you're not gonna read it

    • 6 months ago
      Anonymous

      >Programming could have been so good if only the right systems, paradigms and concepts were implemented earlier on.
      This.
      The correct paradigm is (and always was) seething and dilating.
      We were young and naive, experimenting with new patterns, new languages
      But nothing really improved until finally we seethed and got a CoC in our repository.
      And dilating was what opened us up, allowing us to be receptive to these new paradigms.

  11. 6 months ago
    Anonymous

    what for? shit get offloaded to GPU anyway. host code literally doesn't matter. python is good enough.

  12. 6 months ago
    Anonymous

    >time to implement machine learning in c

    Except that "neural networks" are based on an untested and unproven theory of how the human brain works, and on the presumption that the brain is actually a computer and not really a kind of transceiver that enables us to interact with the aether.

    Just like people stupidly and blindly accept the farce of gravity, they much the same way think that when we think thoughts, our brains are "computing" this...but no, that's really not what is happening at all.

    • 6 months ago
      Anonymous

      >our brains are "computing" this...but no, that's really not what is happening at all.
      And what do ou think really happens?
      Perhaps this is where you should go: >>>/x/

    • 6 months ago
      Anonymous

      frick off Penrose, nobody is buying your book

  13. 6 months ago
    Anonymous

    >2002 AD
    >be me
    >code NN library in C++ for fun before it was cool
    frick y'all normie posers

  14. 6 months ago
    Anonymous

    One of the most used neural network frameworks is written in C: https://github.com/pjreddie/darknet

Your email address will not be published. Required fields are marked *