How many more parameters and training until AI models can create movies indistinguishable from reality?

How many more parameters and training until AI models can create movies indistinguishable from reality?

Mike Stoklasa's Worst Fan Shirt $21.68

Shopping Cart Returner Shirt $21.68

Mike Stoklasa's Worst Fan Shirt $21.68

  1. 4 months ago
    Anonymous

    7

    • 4 months ago
      Anonymous

      [...]

      • 4 months ago
        Anonymous

        bro really thought reddit was a board on bot I'm dead

  2. 4 months ago
    Anonymous

    it's impossible. we're already at the limits of what AI can do. things are just gonna get faster and higher res but not better quality

    • 4 months ago
      Anonymous

      Wrong

    • 4 months ago
      Anonymous

      Use AI for lypsincing on 3d models and animations. I saw a game that was indistinguishable from real life I forgot what it was called.
      Hey you could start with Willy Wonka remake why not.

      • 4 months ago
        Anonymous

        Oh I forgot make the ai make the 3d models too. The ai generated 3d models right now are not that great.

  3. 4 months ago
    Anonymous

    >How many more parameters and training
    to give a sense of scale, let's consider the following:
    the context window of a modern LLM is maybe on the order of 100k tokens
    that means that a good document that it can understand during training is probably that length too
    100k tokens is roughly 400k bytes, and roughly the size of a decent image for training an image generator too
    a video, for comparison, might be 1000x bigger, with some balance between bitrate and duration
    to sense check this, we currently have video gen models that can produce fairly decent ~5 second clips, but a movie is more like ~5000 seconds
    so to answer your question, we need about 3 OOMs more of scaling
    according to Epoch, scaling compute that much will take 5 more years, and scaling parameter count that much will take about 6 more years
    https://epochai.org/trends

    • 4 months ago
      Anonymous

      >100k tokens
      Huh? I thought it was much less.

      • 4 months ago
        Anonymous

        let me guess, you need more?
        https://dev.to/maximsaplin/gpt-4-128k-context-it-is-not-big-enough-1h02

        • 4 months ago
          Anonymous

          That article says 128k tokens is enough for 1684 tweets. Assuming 280 characters per tweet, that's 471520 characters. Assume six characters per average word, including trailing space, that's 78586 words. Pretty impressive to be honest. Though the article says hallucinations are much more likely when you cross 50% of context usage. Anyway, it's not ready to make a movie yet but in a few years, maybe (because it needs a lot more context than just the film script on its own, in order to make a good movie). But, when it gets to that point, it could make a whole feature film pretty much instantly.

          • 4 months ago
            Anonymous

            >Assume six characters per average word, including trailing space
            fwiw, your comment was 600 characters and 105 words, so six characters per word is a very good estimate

  4. 4 months ago
    Anonymous

    One massive limitation AI has that's not often discussed here, is the quality of the training data.
    More effort and resources are poured into collecting, labeling, and processing the data, than anything to do with the AI models themselves.
    If we want to make coherent movies with AI, someone needs to label thousands of movies scene by scene. And how would you even label stuff like that?

    • 4 months ago
      Anonymous

      labelling is a lot easier than generation, and GPT-4 Vision is getting close to doing that at a human level
      with a mixture of synthetic data and a corpus the size of youtube or TV stations' back catalogs, there should be enough training data to last another 5 years no problem

      • 4 months ago
        Anonymous

        >And how would you even label stuff like that?
        With another AI, of course.

        >quality
        AI-generated labeling is not of any quality. Models are only as good as the training data.
        Consider current anime models. They're restricted largely to danboorus tags, which causes most pics to be very rigid in composition and such.
        If you want to do anything interesting you gotta dabble with loras, style-overfitted models and controlnets.

        • 4 months ago
          Anonymous

          >AI-generated labeling is not of any quality.
          do you have a benchmark that compares current labelling systems to the best human labellers?
          it's hard to take your claim seriously when you make such an extreme statement as "not of any quality"

          • 4 months ago
            Anonymous

            My point is that all labeling is shit, which puts a hard limit on the quality of AI generated media.

            • 4 months ago
              Anonymous

              i'm not convinced
              an AI model can find statistical patterns that represent categories like "dog", for example, and all it needs is one label to know how to convert that that region in visual latent space into the equivalent region in textual latent space (the vicinity of "dog", "puppy", "canine", etc.)
              in fact, we've seen that text-only models can learn to convert between two different languages without any supervised training data at all, just by looking at the "shapes" of the connections within each language
              i don't see why a multi-modal model couldn't end up learning patterns in the same way, which is also probably what human babies do when they are learning to understand the world and language at the same time

              • 4 months ago
                Anonymous

                And all you can generate is generic stockphoto-like dogs in uninteresting poses and compositions. And if you try to prompt something more specific, it most likely wont listen because those concepts were not labelled in the training data.
                Your AI can only do what the training data allows it to.

              • 4 months ago
                Anonymous

                >Your AI can only do what the training data allows it to.
                sure, but that's not a problem with the labelling, it's a problem with the training data and parameter count being too small for the space of all possible images that you might want to generate
                i fully expect there to be emergent abilities unlocked at bigger model sizes as the model starts to grok higher level concepts like occlusion and 3D curvature

              • 4 months ago
                Anonymous

                >but that's not a problem with the labelling, it's a problem with the training data
                What do you mean by labeling, if not the training data?

              • 4 months ago
                Anonymous

                >What do you mean by labeling, if not the training data?
                the training data needed to recognize "hand" as a category of object is much smaller than the training data needed to generate realistic hands
                my claim is that the latter is the bottleneck
                so there is already enough data/parameters in the labelling part of the network to recognize hands correctly, but more data/parameters will improve image generation considerably

    • 4 months ago
      Anonymous

      >And how would you even label stuff like that?
      With another AI, of course.

    • 4 months ago
      Anonymous

      Do you want an AI without those limitations?
      Would you pay me decently to build it?

    • 4 months ago
      Anonymous

      >someone needs to label thousands of movies scene by scene. And how would you even label stuff like that?
      Audio description of scenes for the blind. Description of noise from the subtitles for the deaf.

  5. 4 months ago
    Anonymous

    >create movies indistinguishable from reality
    I don't think a LLM could do that. An inherent limitation is that amount of context it can handle (the size of the prompt). An LLM that could ingest an entire movie script as context would be fairly improbable.

  6. 4 months ago
    Anonymous

    I just want wearing GLASS that will display me following things, when i talk to girl :

    1. Answers i can say or question that i can ask her to manipulate her into getting interested in me?

    2. How can i make her laugh so much?

    3. Tease her.

    4. Say things to her that will not put me in friendzone.

    Please someone make ai to do this.

    like a jarvis

    • 4 months ago
      Anonymous

      There's nothing that will stop the awkwardness though. You can't fix timing. You gotta be more confident bro.

  7. 4 months ago
    Anonymous

    >bro really thought reddit was a board on BOT I'm dead
    newbie genAlpha, you must be 18+ to post on this site.

Your email address will not be published. Required fields are marked *