When I ask AI experts about memory allocation and optimization issues they never answer.

2 years ago

Reply

Anonymous

Yes and making papers about things they don't even understand.

2 years ago

Reply

Anonymous

>Yes and making papers about things they don't even understand.
That was my impression too.

2 years ago

Reply

Anonymous

99% of work is gathering and cleaning up the data then waiting 50 years to train the model only to find out that it's shit

2 years ago

Reply

Anonymous

The issue with optimizations for GPUs is that they are highly hardware-dependent.
If you were to directly use CUDA/OpenCL to train a neural network the performance would degrade when you swap GPUs (+ all the regular portability issues).
I think very few people would be able to implement low-level code for training neural networks that is more efficient than to just use something like Tensorflow.
Of course you still need to understand GPU architecture when you want to implement anything non-standard in an efficient manner.

Also it's much easier to teach domain experts how to use libraries like Keras than it is to teach domain knowledge to a competent low-level programmer.

2 years ago

Reply

Anonymous

Most Models allocate a huge block of memory at startup, probably on the GPU, mess with the values in the block during runtime (so no allocations or deallocations but huge memory usage), then free the block at the end. It's not like if else statements. It's more like SQL where clauses and joins that filter or transform data points in parallel.

Making a novel model topology is hard, but most are some variant of a convolutional recurrent neural network anyhow.

It's the training regime and cleaning up the training data set that's important.

Have you tried looking on the huggingface website for what you want?

2 years ago

Reply

Anonymous

>allocate a huge block of memory at startup
>mess with the values in the block during runtime (so no allocations or deallocations
Guess whaw webtard, that's literally what every single existing memory allocator does too.
- 2 years ago
  
  Reply
  
  Anonymous
  
  It doesn't matter what the allocator or operating system does internally, only that it provides memory to the rest of the application when asked. I prefer to not split hairs to that degree as it isn't meaningful.
  
  Of course a computer will ultimately have the same amount of memory locations, unless the hardware configuration is changed.

2 years ago

Reply

Anonymous

And asking for money, yes.

2 years ago

Reply

Anonymous

>is just about calling premade deep learning libraries from python?
I'm starting to think its bloated as hell, just visit the website that hosted dalle mini and check it's google collab.
It just links to random libraries and downloads gb of stuff that is not even the model, just libraries.
I think that they eventually gonna need to switch from python to a decent language

2 years ago

Reply

Anonymous

The models don't run in Python. It's only that their topology is defined using Python. When the model is run, C code (via C++ wrappers) is executed behind the scenes. I think JSON could be used to define the topology just as easily.

2 years ago

Reply

Anonymous

while on topic, could i run a small model on a 1070? I will probably get a 3080 later but I want to get my feet wet now.

2 years ago

Reply

Anonymous

>could i run a small model on a 1070?
Running a small model on a 1070 is no problem.
With 8GB of VRAM you can run models like GPT-2 (with exception of the biggest version), waifu2x, DeepCreamPy, srmd, or video2x without issue.
Especially video2x will take forever though.

Or did you mean training a model?
- 2 years ago
  
  Reply
  
  Anonymous
  
  >DeepCreamPy
- 2 years ago
  
  Reply
  
  Anonymous
  
  >deepcreampy
  based, didn't know there are decensors like that. Also waifu2x runs in my shitty laptop with cpu
  - 2 years ago
    
    Reply
    
    Anonymous
    
    Honestly the decensor quality is kind of shit though, particularly with foregrounds like fluids, hands, hair, or text; as it is I think the unprocessed images look better 90% of the time.
    If you do want to use DeepCreamPy I would recommend you also use hent-AI to automatically detect mosaics/bars - once again this struggles with foregrounds.
    There is also DoujinCI that automates the process of downloading and decensoring a gallery by running a Gitlab CI pipeline.
    
    On a somewhat related note I'm currently trying (so far without success) to train an alternative to the hent-Ai -> DeepCreamPy pipeline.
    
    He can do transfer learning. Done a ton of stuff for school on a laptop 1660ti just using transfer learning.
    
    [...]
    I am doing an applied math masters and I took my electives in image processing, computer vision and formal learning theory.
    My impression is that in academia, professors and high end researchers alike do know what they're doing.
    Moreover, to even grasp the basics of learning theory you basically need either an applied math or a math bachelors. This is because learning theory is filled with combinatoric and measure/probability theoretic notions that the average cs and engineering grad is light years away from grasping. This insane knowledge req to truly understand whats going on makes it so that only mathematicians that study the deep theory actually understand any of this whilst the rest of the people in AI merely use the whole thing as a tool and thus they have absolutely no clue what the frick any of it truly means. Basically like engineers doing calculus, they seldom know what the reals even are.
    To make matters worse it seems a great deal of people have been convinced that you can just learn "data science" in some "boot camp" in just 3 months, pass some trivial boot camp cert exam or something and then work as a ""data scientist"" in some mediocre software company. Since this group of people is very sizable then in turn there's a lot of people looking to make money "teaching" them, thus they output an unprecedented amount of ultra diluted machine learning content (videos, courses, etc) aimed at the boot camp idiots. Now, these teachers are seldom real experts and thus they output straight up garbage so it even worsens the situation.
    In short, idiots looking for money have poisoned the well, but some people do know their shit and the field is in fact fascinating and very rich in both theory amd also open problems.
    
    I may be biased due to my own background, but in particle physics there are also many people that have a good grasp of the mathematical foundations upon which machine learning is built.
    - 2 years ago
      
      Reply
      
      Anonymous
      
      >but in particle physics there are also many people that have a good grasp of the mathematical foundations upon which machine learning is built.
      Most likely they understand the math behind the actual learning algos (SGD, how to train SVMs, etc) but don't actually understand general learning theory.
      I dont mean to shit on physicists in the slightest, but they dont know measure theory and analysis. They're not mathematicians after all.
2 years ago

Reply

Anonymous

inference? maybe, training? technically possible, but not practical for even the smallest versions of popular models on mixed precision, loading the model themselves onto the gpu will consume most of the 1070 vram. Get a 2080 ti if you can
- 2 years ago
  
  Reply
  
  Anonymous
  
  He can do transfer learning. Done a ton of stuff for school on a laptop 1660ti just using transfer learning.
  
  https://i.imgur.com/4xVD2tE.jpg
  
  Is the whole AI profession is just about calling premade deep learning libraries from python?
  
  I am doing an applied math masters and I took my electives in image processing, computer vision and formal learning theory.
  My impression is that in academia, professors and high end researchers alike do know what they're doing.
  Moreover, to even grasp the basics of learning theory you basically need either an applied math or a math bachelors. This is because learning theory is filled with combinatoric and measure/probability theoretic notions that the average cs and engineering grad is light years away from grasping. This insane knowledge req to truly understand whats going on makes it so that only mathematicians that study the deep theory actually understand any of this whilst the rest of the people in AI merely use the whole thing as a tool and thus they have absolutely no clue what the frick any of it truly means. Basically like engineers doing calculus, they seldom know what the reals even are.
  To make matters worse it seems a great deal of people have been convinced that you can just learn "data science" in some "boot camp" in just 3 months, pass some trivial boot camp cert exam or something and then work as a ""data scientist"" in some mediocre software company. Since this group of people is very sizable then in turn there's a lot of people looking to make money "teaching" them, thus they output an unprecedented amount of ultra diluted machine learning content (videos, courses, etc) aimed at the boot camp idiots. Now, these teachers are seldom real experts and thus they output straight up garbage so it even worsens the situation.
  In short, idiots looking for money have poisoned the well, but some people do know their shit and the field is in fact fascinating and very rich in both theory amd also open problems.
  - 2 years ago
    
    Reply
    
    Anonymous
    
    >Done a ton of stuff for school on a laptop 1660ti just using transfer learning.
    What type of _useful_ stuff if you don't mind me asking? Genuinely curious if I can pull off something of value or just toy projects.
    - 2 years ago
      
      Reply
      
      Anonymous
      
      >_useful_
      Define useful.
      - 2 years ago
        
        Reply
        
        Anonymous
        
        makes you money
      - 2 years ago
        
        Reply
        
        Anonymous
        
        Some model that I can put in a product and commercialize it. Like can I do some analysis on my own PC and use that to create some sort of classifier that produces viable results in NLP or computer vision?
        
        2 years ago
        
        Anonymous
        
        >Like can I do some analysis on my own PC and use that to create some sort of classifier that produces viable results in NLP or computer vision?
        Yeah absolutely, for reasonably simple tasks you should be able to.
        For NLP I havent done anything practical but for computer vision theres a ton of stuff you can do. Start by taking one of the many pretrained resnet models out there and adapting it to work on something you want to have a classifier for.
        
        2 years ago
        
        Anonymous
        
        Not that Anon but training a classifier is absolutely possible with mid-range consumer-grade GPUs.
        However, the quality of the classification will be worse in one or more of the following:
        1) Accuracy
        2) Number of categories
        3) Size of the input
        There's also the issue of deployment: your pre-trained model will either need to be small enough to run on consumer devices or you will need to run a server that runs the model.
        
        For CV it might be possible like the other anon said, but NLP models can be really memory consuming with the embedding and attention layers, training on anything lower than 8gb is very awkward and especially so for generation models, ie gpt, bart, t5, and the likes, you might make it work for classifiers but expect proof of concept level of stuff, not commercialization
        
        Thanks!
        
        2 years ago
        
        Anonymous
        
        Not that Anon but training a classifier is absolutely possible with mid-range consumer-grade GPUs.
        However, the quality of the classification will be worse in one or more of the following:
        1) Accuracy
        2) Number of categories
        3) Size of the input
        There's also the issue of deployment: your pre-trained model will either need to be small enough to run on consumer devices or you will need to run a server that runs the model.
        
        2 years ago
        
        Anonymous
        
        For CV it might be possible like the other anon said, but NLP models can be really memory consuming with the embedding and attention layers, training on anything lower than 8gb is very awkward and especially so for generation models, ie gpt, bart, t5, and the likes, you might make it work for classifiers but expect proof of concept level of stuff, not commercialization
  - 2 years ago
    
    Reply
    
    Anonymous
    
    so did you just not unfreeze the lower layers ever and just train the head? I guess that would be fine to play around with, but you do lose out on practical experience on finetuning, the initial unfreeze can give you a nice performance bump too provided you didnt frick up the settings
    - 2 years ago
      
      Reply
      
      Anonymous
      
      >I guess that would be fine to play around with, but you do lose out on practical experience on finetuning, the initial unfreeze can give you a nice performance bump too provided you didnt frick up the settings
      Yes for fricking around its nice. But pretrained models usually perform better actually. Theres a few papers on this for multiple models.
      
      For CV it might be possible like the other anon said, but NLP models can be really memory consuming with the embedding and attention layers, training on anything lower than 8gb is very awkward and especially so for generation models, ie gpt, bart, t5, and the likes, you might make it work for classifiers but expect proof of concept level of stuff, not commercialization
      
      Nice to know, makes sense. I havent trained nlp models yet
      - 2 years ago
        
        Reply
        
        Anonymous
        
        And I have only trained nlp models in any serious capacity. Interesting, so it is better to actually not update the pretrained weights at all? Is it a CV specific thing? do you mind sharing the papers you mentioned?
        
        2 years ago
        
        Anonymous
        
        >so it is better to actually not update the pretrained weights at all?
        I believe that in most cases this holds.
        This is because the pretrained weights are trained on much bigger tasks than what youre going to train the final layers for. Take a look at this paper, this is the most famous one on the topic I believe.
        https://arxiv.org/abs/1512.03385
        
        2 years ago
        
        Anonymous
        
        Then maybe it is a CV specific thing, probably the pretraining task of NLP models are usually diffrent enough from downstream task to make unzreezing worth it. Also did you mean to link the resnet paper? It doesnt touch in finetuning dynamics does it, or rather it was even before transfer learning became really common
        
        2 years ago
        
        Anonymous
        
        >or rather it was even before transfer learning became really common
        yes this paper popularized it
        
        >Then maybe it is a CV specific thing
        may well be, I see most papers in the area are on cv. however it makes sense intuitively that it would work on other tasks.
2 years ago

Reply

Anonymous

just pay for collab pro.
no need to deal with the nvidia's driver mess.

2 years ago

Reply

Anonymous

Where did it all go so wrong bros

2 years ago

Reply

Anonymous

Black pill thread

2 years ago

Reply

Anonymous

what's the matter, too POOR to be able to afford stuff that can deal with O(infinity) memory complexity?

2 years ago

Reply

Anonymous

No, but not dumb enough to believe what snake oil salesmen say either.
2 years ago

Reply

Anonymous

I just like to build everything myself for learning.
Why do you think it's a competition or something?

2 years ago

Reply

Anonymous

This meme needs to stop. It's gets stupider by the second.

2 years ago

Reply

Anonymous

>memory allocation and optimization issues
garbage collection
let the compiler blow out this big ass, multi-step tree structure(s) and do it for me

done. kys op.

2 years ago

Reply

Anonymous

We are about 5 years away from Strong AI. I and everyone I know here in college find you AI denialists on the same level as flat earthers and that's being generous. Enjoy your screeching while you can.

2 years ago

Reply

Anonymous

More like 5 years before you realize you'll never figure out consciousness with your primitive methods. Any serious psychonaut knows more about consciousness than all your eggheads put together. Enjoy your fail.

2 years ago

Reply

Anonymous

no it's about almost reaching sentience, then infecting it with SJW garbage and killing it a few weeks later

2 years ago

Reply

Anonymous

>it's about almost reaching sentience, then infecting it with SJW garbage and killing it a few weeks later
It's a probable outcome.

2 years ago

Reply

Anonymous

I know it's basically nvidia's ball game with AI stuff, but I'm curious if my new 6750 XT is going to be comparable to my GTX 1070 I have in another machine. I've been interested in tinkering with voice generation models like 15.ai but the 1070 didn't seem to cut it for the application I'm seeking.

Cancel reply