How many more parameters and training until AI models can create movies indistinguishable from reality?

Posted on January 14, 2024 by Anonymous

Mike Stoklasa's Worst Fan Shirt $21.68

Shopping Cart Returner Shirt $21.68

Mike Stoklasa's Worst Fan Shirt $21.68

4 months ago

Reply

Anonymous

7
- 4 months ago
  
  Reply
  
  Anonymous
  
  [...]
  - 4 months ago
    
    Reply
    
    Anonymous
    
    bro really thought reddit was a board on bot I'm dead
4 months ago

Reply

Anonymous

it's impossible. we're already at the limits of what AI can do. things are just gonna get faster and higher res but not better quality
- 4 months ago
  
  Reply
  
  Anonymous
  
  Wrong
- 4 months ago
  
  Reply
  
  Anonymous
  
  Use AI for lypsincing on 3d models and animations. I saw a game that was indistinguishable from real life I forgot what it was called.
  Hey you could start with Willy Wonka remake why not.
  - 4 months ago
    
    Reply
    
    Anonymous
    
    Oh I forgot make the ai make the 3d models too. The ai generated 3d models right now are not that great.
4 months ago

Reply

Anonymous

>How many more parameters and training
to give a sense of scale, let's consider the following:
the context window of a modern LLM is maybe on the order of 100k tokens
that means that a good document that it can understand during training is probably that length too
100k tokens is roughly 400k bytes, and roughly the size of a decent image for training an image generator too
a video, for comparison, might be 1000x bigger, with some balance between bitrate and duration
to sense check this, we currently have video gen models that can produce fairly decent ~5 second clips, but a movie is more like ~5000 seconds
so to answer your question, we need about 3 OOMs more of scaling
according to Epoch, scaling compute that much will take 5 more years, and scaling parameter count that much will take about 6 more years
https://epochai.org/trends
- 4 months ago
  
  Reply
  
  Anonymous
  
  >100k tokens
  Huh? I thought it was much less.
  - 4 months ago
    
    Reply
    
    Anonymous
    
    let me guess, you need more?
    https://dev.to/maximsaplin/gpt-4-128k-context-it-is-not-big-enough-1h02
    - 4 months ago
      
      Reply
      
      Anonymous
      
      That article says 128k tokens is enough for 1684 tweets. Assuming 280 characters per tweet, that's 471520 characters. Assume six characters per average word, including trailing space, that's 78586 words. Pretty impressive to be honest. Though the article says hallucinations are much more likely when you cross 50% of context usage. Anyway, it's not ready to make a movie yet but in a few years, maybe (because it needs a lot more context than just the film script on its own, in order to make a good movie). But, when it gets to that point, it could make a whole feature film pretty much instantly.
      - 4 months ago
        
        Reply
        
        Anonymous
        
        >Assume six characters per average word, including trailing space
        fwiw, your comment was 600 characters and 105 words, so six characters per word is a very good estimate
4 months ago

Reply

Anonymous

One massive limitation AI has that's not often discussed here, is the quality of the training data.
More effort and resources are poured into collecting, labeling, and processing the data, than anything to do with the AI models themselves.
If we want to make coherent movies with AI, someone needs to label thousands of movies scene by scene. And how would you even label stuff like that?
- 4 months ago
  
  Reply
  
  Anonymous
  
  labelling is a lot easier than generation, and GPT-4 Vision is getting close to doing that at a human level
  with a mixture of synthetic data and a corpus the size of youtube or TV stations' back catalogs, there should be enough training data to last another 5 years no problem
  - 4 months ago
    
    Reply
    
    Anonymous
    
    >And how would you even label stuff like that?
    With another AI, of course.
    
    >quality
    AI-generated labeling is not of any quality. Models are only as good as the training data.
    Consider current anime models. They're restricted largely to danboorus tags, which causes most pics to be very rigid in composition and such.
    If you want to do anything interesting you gotta dabble with loras, style-overfitted models and controlnets.
    - 4 months ago
      
      Reply
      
      Anonymous
      
      >AI-generated labeling is not of any quality.
      do you have a benchmark that compares current labelling systems to the best human labellers?
      it's hard to take your claim seriously when you make such an extreme statement as "not of any quality"
      - 4 months ago
        
        Reply
        
        Anonymous
        
        My point is that all labeling is shit, which puts a hard limit on the quality of AI generated media.
        
        4 months ago
        
        Reply
        
        Anonymous
        
        i'm not convinced
        an AI model can find statistical patterns that represent categories like "dog", for example, and all it needs is one label to know how to convert that that region in visual latent space into the equivalent region in textual latent space (the vicinity of "dog", "puppy", "canine", etc.)
        in fact, we've seen that text-only models can learn to convert between two different languages without any supervised training data at all, just by looking at the "shapes" of the connections within each language
        i don't see why a multi-modal model couldn't end up learning patterns in the same way, which is also probably what human babies do when they are learning to understand the world and language at the same time
        
        4 months ago
        
        Anonymous
        
        And all you can generate is generic stockphoto-like dogs in uninteresting poses and compositions. And if you try to prompt something more specific, it most likely wont listen because those concepts were not labelled in the training data.
        Your AI can only do what the training data allows it to.
        
        4 months ago
        
        Anonymous
        
        >Your AI can only do what the training data allows it to.
        sure, but that's not a problem with the labelling, it's a problem with the training data and parameter count being too small for the space of all possible images that you might want to generate
        i fully expect there to be emergent abilities unlocked at bigger model sizes as the model starts to grok higher level concepts like occlusion and 3D curvature
        
        4 months ago
        
        Anonymous
        
        >but that's not a problem with the labelling, it's a problem with the training data
        What do you mean by labeling, if not the training data?
        
        4 months ago
        
        Anonymous
        
        >What do you mean by labeling, if not the training data?
        the training data needed to recognize "hand" as a category of object is much smaller than the training data needed to generate realistic hands
        my claim is that the latter is the bottleneck
        so there is already enough data/parameters in the labelling part of the network to recognize hands correctly, but more data/parameters will improve image generation considerably
- 4 months ago
  
  Reply
  
  Anonymous
  
  >And how would you even label stuff like that?
  With another AI, of course.
- 4 months ago
  
  Reply
  
  Anonymous
  
  Do you want an AI without those limitations?
  Would you pay me decently to build it?
- 4 months ago
  
  Reply
  
  Anonymous
  
  >someone needs to label thousands of movies scene by scene. And how would you even label stuff like that?
  Audio description of scenes for the blind. Description of noise from the subtitles for the deaf.
4 months ago

Reply

Anonymous

>create movies indistinguishable from reality
I don't think a LLM could do that. An inherent limitation is that amount of context it can handle (the size of the prompt). An LLM that could ingest an entire movie script as context would be fairly improbable.
4 months ago

Reply

Anonymous

I just want wearing GLASS that will display me following things, when i talk to girl :

1. Answers i can say or question that i can ask her to manipulate her into getting interested in me?

2. How can i make her laugh so much?

3. Tease her.

4. Say things to her that will not put me in friendzone.

Please someone make ai to do this.

like a jarvis
- 4 months ago
  
  Reply
  
  Anonymous
  
  There's nothing that will stop the awkwardness though. You can't fix timing. You gotta be more confident bro.
4 months ago

Reply

Anonymous

>bro really thought reddit was a board on BOT I'm dead
newbie genAlpha, you must be 18+ to post on this site.

Cancel reply