time to implement machine learning in c

time to implement machine learning in c

  1. 3 weeks ago
    Anonymous

    no

  2. 3 weeks ago
    Anonymous

    I tried doing it one night after drinking but i couldn't figure out error calculation and backtracking so i gave up and went to sleep

    • 3 weeks ago
      Anonymous

      Read a book that will teach you those things, and explain the meaning of them.

    • 3 weeks ago
      Anonymous

      Read a book that will teach you those things, and explain the meaning of them.

      >Read a book that will teach you those things, and explain the meaning of them.
      I don't really have a background in math so those books make about as much sense to me as Latin
      I am an engineer though, so i do a lot of Excel and numpy/pandas related stuff, it's just theoretical math that i have trouble with. If someone could show me a guide or video showing 2 hidden layers being backtracked with actual formulas and numbers, I'll be able to copy that.

      • 3 weeks ago
        Anonymous

        >I don't really have a background in math
        >I am an engineer though
        Pardon?

        • 3 weeks ago
          Anonymous

          >Pardon?
          Anon.... It may come as a surprise to you but even though we EEs study Maxwell equations, advanced calculus and other mad math topics, we don't actually understand or use that stuff very often.
          The most i use for my job is maybe highschool calculus and algebra sometimes univ level trigonometry.

          • 3 weeks ago
            Anonymous

            Not even Fourier transforms? In any case check out this book, I'm pretty certain it was written by an electrical engineer (going from memory). I don't think the math requirements should include anything you haven't seen. Taking a quick flick through the pages, the most "advanced" thing I saw was a gradient.

      • 3 weeks ago
        Anonymous
      • 3 weeks ago
        Anonymous

        >engineer
        >doesnt know math
        Choose one!

    • 3 weeks ago
      Anonymous

      To start of with the chain rule of derivatives is important, it means that if we want to find the gradient of a weight with respect to the loss we get at the end of a forward pass, we can find it by multiplying the gradients in between, for example in the image (DL/DW = DL/DA * DA/DS * DS/DW)
      this means that we can step backwards calculating each gradient as we go.

      to find the gradient of the weight we can start by finding the gradient of the Loss with respect to the Activation(s) (for MSE loss) is (2/N) *(Target-Activation). Where N is the number of output units
      The Gradient of the unit input to the activation (for RELU) is: IF x > 0 : 1, ELSE 0
      The gradient of the Weight and Unit Input is Activation(prev) where Activation(prev) is either the output of the unit at the start of the weight, or in this case the X input.

      putting this together we can find the gradient of the weight WRT the Loss.
      DL/DW = DL/DA * DA/DS * DS/DW
      DL/DW = 2*(Target-Activation) * 1 * X
      DL/DW = 2*(0.6-0.25) * 1 * 0.5 = 0.35
      Then you can adjust the weights using Weight = Weight - LearningRate * DL/DW

      outside of trivial networks like this the DL/DA is actually: SUM over J (DL/DInput * Weight(This -> J). Where J is the set of units that take in the output of the activation.

      If you dont understand anything. :shrug:

      • 3 weeks ago
        Anonymous

        >ReLU
        Into the trash it goes

        • 3 weeks ago
          Anonymous

          then use another activation. doesn't change anything.

        • 3 weeks ago
          Anonymous

          its a mongo example, i cant be assed writing the derivative of anything more complicated RN.
          also, I LOVE SPARSE GRADIENTS.

          https://i.imgur.com/b4jUbHx.png

          To start of with the chain rule of derivatives is important, it means that if we want to find the gradient of a weight with respect to the loss we get at the end of a forward pass, we can find it by multiplying the gradients in between, for example in the image (DL/DW = DL/DA * DA/DS * DS/DW)
          this means that we can step backwards calculating each gradient as we go.

          to find the gradient of the weight we can start by finding the gradient of the Loss with respect to the Activation(s) (for MSE loss) is (2/N) *(Target-Activation). Where N is the number of output units
          The Gradient of the unit input to the activation (for RELU) is: IF x > 0 : 1, ELSE 0
          The gradient of the Weight and Unit Input is Activation(prev) where Activation(prev) is either the output of the unit at the start of the weight, or in this case the X input.

          putting this together we can find the gradient of the weight WRT the Loss.
          DL/DW = DL/DA * DA/DS * DS/DW
          DL/DW = 2*(Target-Activation) * 1 * X
          DL/DW = 2*(0.6-0.25) * 1 * 0.5 = 0.35
          Then you can adjust the weights using Weight = Weight - LearningRate * DL/DW

          outside of trivial networks like this the DL/DA is actually: SUM over J (DL/DInput * Weight(This -> J). Where J is the set of units that take in the output of the activation.

          If you dont understand anything. :shrug:

          I've attached a less schitzo and more readable version of the equations for backprop in general.

      • 3 weeks ago
        Anonymous

        Thanks for the explanation

    • 3 weeks ago
      Anonymous

      >backtracking
      backpropagation

    • 3 weeks ago
      Anonymous

      >backpropogation
      Just use a genetic algorithm lol. Its only like 3 times slower than gradient descent backprop but at least does not get stuck in a local minimum all the time. If you do some architecture fuckery you can even prevent overfitting

      • 3 weeks ago
        Anonymous

        >does not get stuck in a local minimum all the time
        Hyperparameters issue

        • 3 weeks ago
          Anonymous

          theres no gaurantee you reach the global minimum given any hyperparameter

          • 3 weeks ago
            Anonymous

            Yes and?

            • 3 weeks ago
              Anonymous

              which means you stuck at local minimum all the time

              • 3 weeks ago
                Anonymous

                >which means you stuck at local minimum all the time
                Not reaching the global minimum with a given set of hyperparameters doesn't mean you're never reaching it with any set of hyperparameters.

              • 3 weeks ago
                Anonymous

                then it would be a luck issue, not hyperparameter

      • 3 weeks ago
        Anonymous

        SGD is theortically and practically superior. It turns out SGD is the best approximation algorithm for ERM learning which is NP-hard.

  3. 3 weeks ago
    Anonymous

    not hard, like 20 lines of code, been done 40 years ago. and aside from some attention layer extras and moving buffers to the GPU, it's more or less still the same thing powering LLMs today

  4. 3 weeks ago
    Anonymous

    >makes it run at kernel level
    What happens?

    • 3 weeks ago
      Anonymous

      Judgement day.

  5. 3 weeks ago
    Anonymous

    the final redpill is to do a forward pass, calculate the error, then pick a random neuron and change it to see if error goes down or not. repeat until you reach the desired performance

    • 3 weeks ago
      Anonymous

      you just went full retard

  6. 3 weeks ago
    Anonymous

    99% of the effort is in autograd. Good luck.

  7. 3 weeks ago
    Anonymous

    Would there be massive improvement in training efficiency if training LLMs are done in C as opposed to python which is what is commonly done in the research realm?

    • 3 weeks ago
      Anonymous

      no, the training algorithms are already written in cuda. python is just a wrapper for it all

    • 3 weeks ago
      Anonymous

      Not really, the computationally costly part are already done in C
      If you want a easier way to speed up common code write the algos in Julia

      • 3 weeks ago
        Anonymous

        >Not really, the computationally costly part are already done in C
        lies.
        C has no auto-vectorization and is too slow.
        for linear algebra, the computationally costly stuff, you would use BLAS in C. BLAS is written in Fortran which actually has auto-vectorization.

        • 3 weeks ago
          Anonymous

          >BLAS is written in Fortran
          *was. It's all C++ now. Boomers who know Fortran are dying faster than we can replace them with C++ fact. We will rewrite BLAS in Rust in the next few decades at this rate.

          • 3 weeks ago
            Anonymous

            Except rust is dying already.

            • 3 weeks ago
              Anonymous

              Microsoft has literally given Rust billions of dollars.

              • 3 weeks ago
                Anonymous

                Wait what, why?

              • 3 weeks ago
                Anonymous

                llama doesnt use relu
                history, relu was not used because it didn't model a biological neutron correctly. alexnet said that s relu allowed model training to be several times more efficient, so relu was used every since then.
                apparently the newest networks don't even use activation functions at all.

              • 3 weeks ago
                Anonymous

                unless they've got something really fucky they still need an activation function of some sort to prevent linearity.
                Obvs i dont have my ear to the ground on that type of NN's but maybe its just a different type of ML

              • 3 weeks ago
                Anonymous

                yeah it sounded weird when i read it. they are called state space models. they sicked until last week until a paper by a single author improved them with a few obvious-in-hindsight tricks to be a gajillion percent better than transformers... at like just a few partners, no one has scanned them up yet, at least not the newer variants.

              • 3 weeks ago
                Anonymous

                holy cow, swype typig is quite inaccurate

              • 3 weeks ago
                Anonymous

                hey man, stop drinking beer, ok? that's a good boy.

              • 3 weeks ago
                Anonymous

                nta and it wasn't billions it was millions. The reason is that the Microsoft CTO is a Rust fanboy, also all 70% of their security vulnerabilities are caused by bugs that are not possible in Rust

          • 3 weeks ago
            Anonymous

            there's no way you would write GEMM in C or C++. it's either Fortran or hand-tuned assembly.

    • 3 weeks ago
      Anonymous

      >he thinks people haven't optimized AI in every possible way

      >yeah my random saturday idea would help the world advance.
      >why is the world so dumb and I'm so good

      • 3 weeks ago
        Anonymous

        He's just asking a question you nagger, no need to create a fake scenario in your head

        • 3 weeks ago
          Anonymous

          asking an obviously stupid question is called baiting

    • 3 weeks ago
      Anonymous

      nagger, the heavy lifting is done in C++, the king of performance. Python is just the interface. If you did them in C it would literally be a downgrade.

  8. 3 weeks ago
    Anonymous

    I wrote a neural network without a framework in dartlang in undergrad and php right out of college.

  9. 3 weeks ago
    Anonymous

    What cool project can you do with only a little neural network?
    Like if you do it in C using only a few gigs of ram and no GPU.

    • 3 weeks ago
      Anonymous

      bert, but for things other than language
      of course then you have to use its output vector for something

      • 3 weeks ago
        Anonymous

        >bert, but for things other than language
        like what? can you even use a language model for other than language generation stuff?

        • 3 weeks ago
          Anonymous

          anything sequential, not not time-series... hmm... how about clarifying Internet sequence? Algorithms for calculating the nth digit of Pi are extremely common and very diverse. Can bert auto-learn this? What about e? if you feed 10 digits of pi to bert, will it detect that the sequence is pi-like or e-like?

          if you look up recent state space model papers, those models have modes where they can be scaled to bahave sequence-favoring (transformers, quadratic comparison complexity) to series-favoring (special ssm sauce, either linear or log complexity). you can imagine making a tiny bert with state space models configured to be transformer-like and then tweak it from there.

          as an aside, berts do not normally do language generation, but recently they have been popular for that purpose. normally they are envisioned as encoders if text or sentences.

          i wonder why no one embeds berts in gpts to do coarse decision making.

          • 3 weeks ago
            Anonymous

            fuck, damnit, why does swype typing never work! okay anon i hope you can decrypt what i just wrote....

  10. 3 weeks ago
    Anonymous

    Programming could have been so good if only the right systems, paradigms and concepts were implemented earlier on.

    The speed of programming could have been 100x faster than the current pace, I am so upset that every single person has got it completely wrong. Far out, far out, we've done everything wrong. If you could see what I can see in my own programs...

    • 3 weeks ago
      Anonymous

      Please do tell anon, what programming techniques and parardigms have you been using?

      • 3 weeks ago
        Anonymous

        rust obviously

    • 3 weeks ago
      Anonymous

      why don't you opine about x86 while you're at it

      • 3 weeks ago
        Anonymous

        >he isn't writing ISO C
        why does anyone get even mad about this. duh you can't afford the spec and you're not gonna read it

    • 3 weeks ago
      Anonymous

      >Programming could have been so good if only the right systems, paradigms and concepts were implemented earlier on.
      This.
      The correct paradigm is (and always was) seething and dilating.
      We were young and naive, experimenting with new patterns, new languages
      But nothing really improved until finally we seethed and got a CoC in our repository.
      And dilating was what opened us up, allowing us to be receptive to these new paradigms.

  11. 3 weeks ago
    Anonymous

    what for? shit get offloaded to GPU anyway. host code literally doesn't matter. python is good enough.

  12. 3 weeks ago
    Anonymous

    >time to implement machine learning in c

    Except that "neural networks" are based on an untested and unproven theory of how the human brain works, and on the presumption that the brain is actually a computer and not really a kind of transceiver that enables us to interact with the aether.

    Just like people stupidly and blindly accept the farce of gravity, they much the same way think that when we think thoughts, our brains are "computing" this...but no, that's really not what is happening at all.

    • 3 weeks ago
      Anonymous

      >our brains are "computing" this...but no, that's really not what is happening at all.
      And what do ou think really happens?
      Perhaps this is where you should go: >>>/x/

    • 3 weeks ago
      Anonymous

      fuck off Penrose, nobody is buying your book

  13. 3 weeks ago
    Anonymous

    >2002 AD
    >be me
    >code NN library in C++ for fun before it was cool
    fuck y'all normie posers

  14. 3 weeks ago
    Anonymous

    One of the most used neural network frameworks is written in C: https://github.com/pjreddie/darknet

Your email address will not be published. Required fields are marked *