Future of ChatGPT?

Posted on December 27, 2022 by Anonymous

>Full ChatGPT 175B parameter model requires 5 (five) A100 GPUs to load the model.
>It takes at least 20 - 40 seconds to answer simple query.
>Compute cost is about 1 cent per query assuming relatively short queries and answers.

There is no way they can monetize anything this expensive with ads. It's also impossible to scale it like some google search? You run out of GPU's in the world (and electricity).

Mike Stoklasa's Worst Fan Shirt $21.68

CRIME Shirt $21.68

Mike Stoklasa's Worst Fan Shirt $21.68

1 year ago

Reply

Anonymous

They are going to sell queries for money, e.g. you buy the right to perform 10k queries for a lump sum of say $100. Same model as any other API service out there.
Regarding scaling: first, realize that queries can be batched on the GPUs, which allows tighter scaling. It is most likely already implemented, and can likely be made tighter. This is only viable when many queries come in at once though, but that's the same, once again, as any other API service, so they already know how to do that and scale it correctly.
In terms of DL, as you say this is completely unmaintainable. 1T params will require 50 A100's to load, for instance, and I assure you that gains are going to be marginal (diminishing return). A new architecture will come out in a few years that will be able to do what GPT 175b params can do in just 10-100m params, as it's usually been the case in DL so far: bruteforcing only goes so far. But also, the more powerful those models become, the slower those progresses will be. Now it's basically a race between researchers building more powerful and parameter-efficient models vs how much power and capacity we need to reach singularity levels of performance.
- 1 year ago
  
  Reply
  
  Anonymous
  
  All those parameters just to still fail at preschool...
  - 1 year ago
    
    Reply
    
    Anonymous
    
    I'm actually surprised it reached an almost correct conclusion.
    I bet there are better examples of failures out there.
  - 1 year ago
    
    Reply
    
    Anonymous
    
    I'm actually surprised it reached an almost correct conclusion.
    I bet there are better examples of failures out there.
    
    There was a twitter thread about just this, and now the AI gets it correct and has properly formatted and logical reasoning written out.
    - 1 year ago
      
      Reply
      
      Anonymous
      
      For this specific question?
      - 1 year ago
        
        Reply
        
        Anonymous
        
        it still flops that
        
        https://twitter.com/nisten/status/1607483294643326978?s=20&t=VV943Vtgl_79v-zKxitNRA
    - 1 year ago
      
      Reply
      
      Anonymous
      
      it still flops that
  - 1 year ago
    
    Reply
    
    Anonymous
    
    This is a good example of what I mean: I don't think GPT4 will do any better on reasoning tasks. That is because transformers are awesome at symbol manipulation, but reasoning is more complicated than that. It's why if you ask it what 0.88888 equals, it may tell you it equals 1 and use the "0.9999 = 1" proof but with 0.999 replaced by 0.88 everywhere.
- 1 year ago
  
  Reply
  
  Anonymous
  
  >A new architecture will come out in a few years that will be able to do what GPT 175b params can do in just 10-100m params
  Isn't that also subjected to the diminishing returns law?
  - 1 year ago
    
    Reply
    
    Anonymous
    
    Yes, but the point is that with the same resources, it will be capable of doing, say, 10x more. Then we will wait another few years and get another breakthrough which will once again do 10x more (i.e. 100x more in total).
    - 1 year ago
      
      Reply
      
      Anonymous
      
      That looks like arithmetic progression and not diminishing returns.
      But ofc, I have no idea how it would play out. But I instinctively think it will play out with diminishing returns.
      - 1 year ago
        
        Reply
        
        Anonymous
        
        The diminishing return I'm talking about is only with regard to the bruteforce approach, I still believe in no such limits when it comes to new breakthroughs at the research level. However I also strongly believe the paradigm that will bring us to the 'next stage' will be completely different from current DL. Likely a mix of "soft and hard layers" (soft = learned, hard = rules, possibly automatically determined).
        
        1 year ago
        
        Reply
        
        Anonymous
        
        even with infinite compute I can't see how anything even approaching AGI can exist if the model is essentially still just feeding it human-produced knowledge; unless youre training it exclusively on the top 1% most competent experts in whatever it will always be a midwit-machine. Sure that might put 95% of white-collar workers out of a job but it can't ever make actual progress; if anything the increasingly moronic zoomzoom society will start to eschew learning themselves in favour of outsourcing to the AI, until there is no recent human-produced training data left and they start feeding it on its own excrement, at which point it will spiral downhill
        
        1 year ago
        
        Anonymous
        
        The idea is that with infinite resources, you could just
        1- simulate a brain perfectly and
        2- train the AI on the same tasks children do as children
        Thus simulating a full training loop that works the same as you'd do to a human, but without genetic limits on intelligence and physical limits on processing speeds, for instance.
        One level removed from this is to abstract neurons repeatedly so they do the same thing but without any of the details that are implementation-specific to carbon life. Next you can try to optimize this insilico. You can also simulate the training setup and, once again, optimize it (i.e. cut things that don't matter, etc.).
        The result I described is what we really want to arrive at from the 'other end' (i.e. from our pitiful approximations that is deep learning and pure disembodied text symbol feeding).
        In the limit of infinite resources and time, there is nothing preventing us from getting actual AGI since even without understanding the brain, we can just use probes and reproduce what we see in the computer but that's not actually interesting because maybe this requires more atoms than there are in the universe to have sufficient capacity for this kind of sim. The question is exactly this: not knowing which level we need to reach for bootstrapping, is that level within the realm of what we can achieve at all within feasible resource constraints? And the connected problem is: what is the right level of approximation we need to use? As I like to say (taken from elsewhere): planes don't flap their wings. The same may be sufficient for AGI.
        In the meantime I'd be happy enough if we can use AI techniques to spawn new actually-working drugs and get rid of ~~*doctors*~~ and ~~*lawyers*~~ and other scum.
- 1 year ago
  
  Reply
  
  Anonymous
  
  >as it's usually been the case in DL so far: bruteforcing only goes so far
  No its absolutely not the case so far, what research have you been reading? Model size has nothing but increased since transformer. Efficient methods are a whole different direction on its own, if you want performance, you are still using bert base sized models and above, starting at 330m params
  - 1 year ago
    
    Reply
    
    Anonymous
    
    I have a PhD in deep learning. Stop believing random twitter posts and tiktok or whatever it is you normalgay get conned by these days.
    - 1 year ago
      
      Reply
      
      Anonymous
      
      so your PhD makes whatever shit you spew correct? A PhD is nothing special in this field, I have one too moron
      and if anything far more researchers go on twitter than this place
1 year ago

Reply

Anonymous

Memoizanion: store a cache of previous made questions, most homosexuals will ask similar queries, then search similar to those instead of calculating anew.
PS: This is also used in image generation and labeling services.
1 year ago

Reply

Anonymous

>Full ChatGPT 175B parameter model requires 5 (five) A100 GPUs to load the model.
you chuds can't afford $150k? I thought you were all super wealthy tech bros
1 year ago

Reply

Anonymous

People would gladly pay with micro transactions, but payment processor overhead would be too big
Of course there will be morons willing to pay for a massively overpriced subscription
1 year ago

Reply

Anonymous

Is it not possible to host this locally or is it too much?
Are there desktop apps?
1 year ago

Reply

Anonymous

>only 5 gpus
>only 175b
they need to release it for me to load it, complete pussy shit that could run in my closet
- 1 year ago
  
  Reply
  
  Anonymous
  
  >only 5 ~~*80gb vram*~~ gpus
  fify
1 year ago

Reply

Anonymous

They can just sell managed instances to big companies.
1 year ago

Reply

Anonymous

Sexo
1 year ago

Reply

Anonymous

Has anyone been compiling a list of the various rote answers to topics that would furrow the brow of a Californian? I meant to but I shrugged off some good ones and now I regret it because I could use them. I'm currently scouring BOT for them.
1 year ago

Reply

Anonymous

Why is she making that face
- 1 year ago
  
  Reply
  
  Anonymous
  
  GONG!

Cancel reply