Implementing machine learning in C

Posted on November 11, 2023February 11, 2024 by Anonymous

time to implement machine learning in c

UFOs Are A Psyop Shirt $21.68

Yakub: World's Greatest Dad Shirt $21.68

UFOs Are A Psyop Shirt $21.68

6 months ago

Reply

Anonymous

no
6 months ago

Reply

Anonymous

I tried doing it one night after drinking but i couldn't figure out error calculation and backtracking so i gave up and went to sleep
- 6 months ago
  
  Reply
  
  Anonymous
  
  Read a book that will teach you those things, and explain the meaning of them.
- 6 months ago
  
  Reply
  
  Anonymous
  
  Read a book that will teach you those things, and explain the meaning of them.
  
  >Read a book that will teach you those things, and explain the meaning of them.
  I don't really have a background in math so those books make about as much sense to me as Latin
  I am an engineer though, so i do a lot of Excel and numpy/pandas related stuff, it's just theoretical math that i have trouble with. If someone could show me a guide or video showing 2 hidden layers being backtracked with actual formulas and numbers, I'll be able to copy that.
  - 6 months ago
    
    Reply
    
    Anonymous
    
    >I don't really have a background in math
    >I am an engineer though
    Pardon?
    - 6 months ago
      
      Reply
      
      Anonymous
      
      >Pardon?
      Anon.... It may come as a surprise to you but even though we EEs study Maxwell equations, advanced calculus and other mad math topics, we don't actually understand or use that stuff very often.
      The most i use for my job is maybe highschool calculus and algebra sometimes univ level trigonometry.
      - 6 months ago
        
        Reply
        
        Anonymous
        
        Not even Fourier transforms? In any case check out this book, I'm pretty certain it was written by an electrical engineer (going from memory). I don't think the math requirements should include anything you haven't seen. Taking a quick flick through the pages, the most "advanced" thing I saw was a gradient.
  - 6 months ago
    
    Reply
    
    Anonymous
  - 6 months ago
    
    Reply
    
    Anonymous
    
    >engineer
    >doesnt know math
    Choose one!
- 6 months ago
  
  Reply
  
  Anonymous
  
  To start of with the chain rule of derivatives is important, it means that if we want to find the gradient of a weight with respect to the loss we get at the end of a forward pass, we can find it by multiplying the gradients in between, for example in the image (DL/DW = DL/DA * DA/DS * DS/DW)
  this means that we can step backwards calculating each gradient as we go.
  
  to find the gradient of the weight we can start by finding the gradient of the Loss with respect to the Activation(s) (for MSE loss) is (2/N) *(Target-Activation). Where N is the number of output units
  The Gradient of the unit input to the activation (for RELU) is: IF x > 0 : 1, ELSE 0
  The gradient of the Weight and Unit Input is Activation(prev) where Activation(prev) is either the output of the unit at the start of the weight, or in this case the X input.
  
  putting this together we can find the gradient of the weight WRT the Loss.
  DL/DW = DL/DA * DA/DS * DS/DW
  DL/DW = 2*(Target-Activation) * 1 * X
  DL/DW = 2*(0.6-0.25) * 1 * 0.5 = 0.35
  Then you can adjust the weights using Weight = Weight - LearningRate * DL/DW
  
  outside of trivial networks like this the DL/DA is actually: SUM over J (DL/DInput * Weight(This -> J). Where J is the set of units that take in the output of the activation.
  
  If you dont understand anything. :shrug:
  - 6 months ago
    
    Reply
    
    Anonymous
    
    >ReLU
    Into the trash it goes
    - 6 months ago
      
      Reply
      
      Anonymous
      
      then use another activation. doesn't change anything.
    - 6 months ago
      
      Reply
      
      Anonymous
      
      its a mongo example, i cant be assed writing the derivative of anything more complicated RN.
      also, I LOVE SPARSE GRADIENTS.
      
      https://i.imgur.com/b4jUbHx.png
      
      To start of with the chain rule of derivatives is important, it means that if we want to find the gradient of a weight with respect to the loss we get at the end of a forward pass, we can find it by multiplying the gradients in between, for example in the image (DL/DW = DL/DA * DA/DS * DS/DW)
      this means that we can step backwards calculating each gradient as we go.
      
      to find the gradient of the weight we can start by finding the gradient of the Loss with respect to the Activation(s) (for MSE loss) is (2/N) *(Target-Activation). Where N is the number of output units
      The Gradient of the unit input to the activation (for RELU) is: IF x > 0 : 1, ELSE 0
      The gradient of the Weight and Unit Input is Activation(prev) where Activation(prev) is either the output of the unit at the start of the weight, or in this case the X input.
      
      putting this together we can find the gradient of the weight WRT the Loss.
      DL/DW = DL/DA * DA/DS * DS/DW
      DL/DW = 2*(Target-Activation) * 1 * X
      DL/DW = 2*(0.6-0.25) * 1 * 0.5 = 0.35
      Then you can adjust the weights using Weight = Weight - LearningRate * DL/DW
      
      outside of trivial networks like this the DL/DA is actually: SUM over J (DL/DInput * Weight(This -> J). Where J is the set of units that take in the output of the activation.
      
      If you dont understand anything. :shrug:
      
      I've attached a less schitzo and more readable version of the equations for backprop in general.
  - 6 months ago
    
    Reply
    
    Anonymous
    
    Thanks for the explanation
- 6 months ago
  
  Reply
  
  Anonymous
  
  >backtracking
  backpropagation
- 6 months ago
  
  Reply
  
  Anonymous
  
  >backpropogation
  Just use a genetic algorithm lol. Its only like 3 times slower than gradient descent backprop but at least does not get stuck in a local minimum all the time. If you do some architecture frickery you can even prevent overfitting
  - 6 months ago
    
    Reply
    
    Anonymous
    
    >does not get stuck in a local minimum all the time
    Hyperparameters issue
    - 6 months ago
      
      Reply
      
      Anonymous
      
      theres no gaurantee you reach the global minimum given any hyperparameter
      - 6 months ago
        
        Reply
        
        Anonymous
        
        Yes and?
        
        6 months ago
        
        Reply
        
        Anonymous
        
        which means you stuck at local minimum all the time
        
        6 months ago
        
        Anonymous
        
        >which means you stuck at local minimum all the time
        Not reaching the global minimum with a given set of hyperparameters doesn't mean you're never reaching it with any set of hyperparameters.
        
        6 months ago
        
        Anonymous
        
        then it would be a luck issue, not hyperparameter
  - 6 months ago
    
    Reply
    
    Anonymous
    
    SGD is theortically and practically superior. It turns out SGD is the best approximation algorithm for ERM learning which is NP-hard.
6 months ago

Reply

Anonymous

not hard, like 20 lines of code, been done 40 years ago. and aside from some attention layer extras and moving buffers to the GPU, it's more or less still the same thing powering LLMs today
6 months ago

Reply

Anonymous

>makes it run at kernel level
What happens?
- 6 months ago
  
  Reply
  
  Anonymous
  
  Judgement day.
6 months ago

Reply

Anonymous

the final redpill is to do a forward pass, calculate the error, then pick a random neuron and change it to see if error goes down or not. repeat until you reach the desired performance
- 6 months ago
  
  Reply
  
  Anonymous
  
  you just went full moron
6 months ago

Reply

Anonymous

99% of the effort is in autograd. Good luck.
6 months ago

Reply

Anonymous

Would there be massive improvement in training efficiency if training LLMs are done in C as opposed to python which is what is commonly done in the research realm?
- 6 months ago
  
  Reply
  
  Anonymous
  
  no, the training algorithms are already written in cuda. python is just a wrapper for it all
- 6 months ago
  
  Reply
  
  Anonymous
  
  Not really, the computationally costly part are already done in C
  If you want a easier way to speed up common code write the algos in Julia
  - 6 months ago
    
    Reply
    
    Anonymous
    
    >Not really, the computationally costly part are already done in C
    lies.
    C has no auto-vectorization and is too slow.
    for linear algebra, the computationally costly stuff, you would use BLAS in C. BLAS is written in Fortran which actually has auto-vectorization.
    - 6 months ago
      
      Reply
      
      Anonymous
      
      >BLAS is written in Fortran
      *was. It's all C++ now. Boomers who know Fortran are dying faster than we can replace them with C++ fact. We will rewrite BLAS in Rust in the next few decades at this rate.
      - 6 months ago
        
        Reply
        
        Anonymous
        
        Except rust is dying already.
        
        6 months ago
        
        Reply
        
        Anonymous
        
        Microsoft has literally given Rust billions of dollars.
        
        6 months ago
        
        Anonymous
        
        Wait what, why?
        
        6 months ago
        
        Anonymous
        
        llama doesnt use relu
        history, relu was not used because it didn't model a biological neutron correctly. alexnet said that s relu allowed model training to be several times more efficient, so relu was used every since then.
        apparently the newest networks don't even use activation functions at all.
        
        6 months ago
        
        Anonymous
        
        unless they've got something really fricky they still need an activation function of some sort to prevent linearity.
        Obvs i dont have my ear to the ground on that type of NN's but maybe its just a different type of ML
        
        6 months ago
        
        Anonymous
        
        yeah it sounded weird when i read it. they are called state space models. they sicked until last week until a paper by a single author improved them with a few obvious-in-hindsight tricks to be a gajillion percent better than transformers... at like just a few partners, no one has scanned them up yet, at least not the newer variants.
        
        6 months ago
        
        Anonymous
        
        holy cow, swype typig is quite inaccurate
        
        6 months ago
        
        Anonymous
        
        hey man, stop drinking beer, ok? that's a good boy.
        
        6 months ago
        
        Anonymous
        
        nta and it wasn't billions it was millions. The reason is that the Microsoft CTO is a Rust fanboy, also all 70% of their security vulnerabilities are caused by bugs that are not possible in Rust
      - 6 months ago
        
        Reply
        
        Anonymous
        
        there's no way you would write GEMM in C or C++. it's either Fortran or hand-tuned assembly.
- 6 months ago
  
  Reply
  
  Anonymous
  
  >he thinks people haven't optimized AI in every possible way
  
  >yeah my random saturday idea would help the world advance.
  >why is the world so dumb and I'm so good
  - 6 months ago
    
    Reply
    
    Anonymous
    
    He's just asking a question you Black person, no need to create a fake scenario in your head
    - 6 months ago
      
      Reply
      
      Anonymous
      
      asking an obviously stupid question is called baiting
- 6 months ago
  
  Reply
  
  Anonymous
  
  Black person, the heavy lifting is done in C++, the king of performance. Python is just the interface. If you did them in C it would literally be a downgrade.
6 months ago

Reply

Anonymous

I wrote a neural network without a framework in dartlang in undergrad and php right out of college.
6 months ago

Reply

Anonymous

What cool project can you do with only a little neural network?
Like if you do it in C using only a few gigs of ram and no GPU.
- 6 months ago
  
  Reply
  
  Anonymous
  
  bert, but for things other than language
  of course then you have to use its output vector for something
  - 6 months ago
    
    Reply
    
    Anonymous
    
    >bert, but for things other than language
    like what? can you even use a language model for other than language generation stuff?
    - 6 months ago
      
      Reply
      
      Anonymous
      
      anything sequential, not not time-series... hmm... how about clarifying Internet sequence? Algorithms for calculating the nth digit of Pi are extremely common and very diverse. Can bert auto-learn this? What about e? if you feed 10 digits of pi to bert, will it detect that the sequence is pi-like or e-like?
      
      if you look up recent state space model papers, those models have modes where they can be scaled to bahave sequence-favoring (transformers, quadratic comparison complexity) to series-favoring (special ssm sauce, either linear or log complexity). you can imagine making a tiny bert with state space models configured to be transformer-like and then tweak it from there.
      
      as an aside, berts do not normally do language generation, but recently they have been popular for that purpose. normally they are envisioned as encoders if text or sentences.
      
      i wonder why no one embeds berts in gpts to do coarse decision making.
      - 6 months ago
        
        Reply
        
        Anonymous
        
        frick, damnit, why does swype typing never work! okay anon i hope you can decrypt what i just wrote....
6 months ago

Reply

Anonymous

Programming could have been so good if only the right systems, paradigms and concepts were implemented earlier on.

The speed of programming could have been 100x faster than the current pace, I am so upset that every single person has got it completely wrong. Far out, far out, we've done everything wrong. If you could see what I can see in my own programs...
- 6 months ago
  
  Reply
  
  Anonymous
  
  Please do tell anon, what programming techniques and parardigms have you been using?
  - 6 months ago
    
    Reply
    
    Anonymous
    
    rust obviously
- 6 months ago
  
  Reply
  
  Anonymous
  
  why don't you opine about x86 while you're at it
  - 6 months ago
    
    Reply
    
    Anonymous
    
    >he isn't writing ISO C
    why does anyone get even mad about this. duh you can't afford the spec and you're not gonna read it
- 6 months ago
  
  Reply
  
  Anonymous
  
  >Programming could have been so good if only the right systems, paradigms and concepts were implemented earlier on.
  This.
  The correct paradigm is (and always was) seething and dilating.
  We were young and naive, experimenting with new patterns, new languages
  But nothing really improved until finally we seethed and got a CoC in our repository.
  And dilating was what opened us up, allowing us to be receptive to these new paradigms.
6 months ago

Reply

Anonymous

what for? shit get offloaded to GPU anyway. host code literally doesn't matter. python is good enough.
6 months ago

Reply

Anonymous

>time to implement machine learning in c

Except that "neural networks" are based on an untested and unproven theory of how the human brain works, and on the presumption that the brain is actually a computer and not really a kind of transceiver that enables us to interact with the aether.

Just like people stupidly and blindly accept the farce of gravity, they much the same way think that when we think thoughts, our brains are "computing" this...but no, that's really not what is happening at all.
- 6 months ago
  
  Reply
  
  Anonymous
  
  >our brains are "computing" this...but no, that's really not what is happening at all.
  And what do ou think really happens?
  Perhaps this is where you should go: >>>/x/
- 6 months ago
  
  Reply
  
  Anonymous
  
  frick off Penrose, nobody is buying your book
6 months ago

Reply

Anonymous

>2002 AD
>be me
>code NN library in C++ for fun before it was cool
frick y'all normie posers
6 months ago

Reply

Anonymous

One of the most used neural network frameworks is written in C: https://github.com/pjreddie/darknet

Cancel reply