What causes difussion models to be much better than previous methods of AI like GAN and style transfer?

What causes difussion models to be much better than previous methods of AI like GAN and style transfer?

  1. 2 months ago
    Anonymous

    Diffusion models are a type of generative model that can be used for image generation. These models have become popular in recent years because they are able to generate high-quality images that are often more realistic and detailed than those generated by other methods, such as GANs and style transfer.

    One reason for the success of diffusion models is that they are based on a process called Markov chain Monte Carlo (MCMC), which allows them to explore the space of possible images in a more efficient and controlled way. This means that they are able to generate images that are more diverse and capture a wider range of features and variations than other methods.

    Another reason for the success of diffusion models is that they are able to learn and generate images in a hierarchical manner. This means that they can generate images at different scales, starting with low-level details and gradually adding more and more complex features. This allows them to generate images that have a high degree of realism and detail, which is difficult to achieve with other methods.

    In addition, diffusion models often use a technique called annealed importance sampling (AIS) to sample from the distribution of possible images. This allows them to generate images that are more diverse and capture a wider range of features and variations than other methods.

    Overall, the combination of these factors makes diffusion models a powerful tool for image generation, and they have become a popular choice for many researchers and practitioners in the field of AI.

    • 2 months ago
      Anonymous

      Thanks, GPT3.

    • 2 months ago
      Anonymous

      >GANs
      Were notoriously hard to train correctly, other than that they suffered heavily from mode chasing and were still subject to mode collapse
      >VAEs
      Severely subject to mode collapse, nearly impossible to balance capacity for modeling vs distributing. Unlike GANs, you couldn't mess with the parameters well enough to get them to work, it was more one or zero.
      >Diffusions
      Are basically a flattened version of highly deeply hierarchical VAEs (in the scoring model formulation, which is the more popular one). It has a nice analytical form that allows training in one shot, effectively by selecting which of the levels of the VAE hierarchy you want, and calculating the pre and post of the transformation in one step.
      Style transfer is not a model but rather an application. Style transfer was done with GANs or with VAEs or autoregressive models like PixelCNN or PixelRNN.

      Also, VAE variants like soft-infoVAE work about as well as diffusions in theory, but again suffer from finicky to balance hyperparameters (though the advantage is that they're crazy faster at inference time). VQ-VAEs can also work about as well, but I haven't seen this to be true in my experiments so either it's also very hard to balance or it only works in limited datasets (or not at all).

      Literally everything here is wrong. Guess GPT still has a ways to go huh.
      Basic wrongness:
      - There is no mcmc in diffusions (the closest is that in the scoring model formalism, you can use langevin dynamics, and thus an mcmc, to produce samples. Note how that has nothing to do with exploration and that diffusions sure as hell isn't "based on" this).
      - The MCMC does not explore anything, let alone in a controlled way or efficiently (MCMC is a slow method, which was the downfall of so-called graphical models which used e.g. contrastive divergence for optimization -- a MCMC algorithm). The exploration is performed by adding noise in a separate process (i.e. the diffusion, hence the name)

      • 2 months ago
        Anonymous

        - Diffusion models do not generate images in a hierarchical manner, at different scales, starting with low-level details, or anything like that. However, they iteratively denoise an image (the starting image is gaussian noise, which the model tries to iteratively move toward an actual full image). The DRAW model by deepmind did generate starting from low details to fine details, and while impressive at the time, it was pretty shit compared to what we have now.
        - Diffusions absolutely in no way use any kind of importance sampling, but such methods have been proposed. AIS is a variance *reduction* method, so it would make samples LESS, not more, diverse.

        I also recommend this blog:
        https://yang-song.net/blog/2021/score/
        which explains the score-based view of diffusion models quite well.

      • 2 months ago
        Anonymous

        Thanks, GPT3.

        • 2 months ago
          Anonymous

          Happy to help, hu- I mean, fellow BOT user.

    • 2 months ago
      Anonymous

      >thesis
      >one reason...
      >another reason...
      >in addition...
      >overall...
      >perfect punctuation and capitalization on BOT
      this reply was unironically written by a generative model

      • 2 months ago
        Anonymous

        Anon... it's pretty obvious, why did you even bother making that comment?

    • 2 months ago
      Anonymous

      god bless god ai thanks

  2. 2 months ago
    Anonymous

    Other things of note:
    >so, why is it so good?
    It just is, it's a breakthrough like there are every so often. GANs were invented in what, 2013? It took until wasserstein GANs for them to work. Diffusions date back to the 90's, but in the deep learning context are from 2015. It took until the DDPM (denoising diffusion probabilistic model) paper in 2020 for it to pick up steam.
    It's about as much of a breakthrough as LSTMs were for sequence learning.

  3. 2 months ago
    Anonymous

    Can you use Diffusion models everywhere you can use GANs and VAEs? There's some papers I want to try implementing, but with diffusion models because GANs & VAEs sound awful to train.

    • 2 months ago
      Anonymous

      What do you mean by 'awful' exactly?
      The problem with diffusions is that they're very tricky to implement right, mostly in the sense that a single typo in one of the 5 main equations will fuck everything up, and there's no real way to debug what's the problem. Another issue is that careful noise schedule design is required for good results, but for color image data the cosine schedule with gaussian noise is just fine so there's not much to think about there thankfully.
      Diffusions can be used anywhere you'd like to use GANs or VAEs for image generation, and possibly video generation, but not in other contexts:
      - VAEs basically give you inbuilt statistics on the latents since it forces the latents toward N(0, 1) (in practice, because nobody uses shaped or learned gauss params and gauss is the only one that works right for latents), so you can do things like outlier detection easier.
      - You can readily interpolate in VAE's latent space, not so in DDPMs (but that is so in DDIMs, so you can look into that).
      - GANs let you train on two disparate, unlabelled datasets and get style transfer "for free". Diffusions can't do that, however they can do conditioning "fully offline" (you train on a big dataset, then you have a classifier that tells you your variates, and you can easily combine the two to get guided generation with no finetuning at all).
      - I find in practice that VAEs work better/are much easier to train than diffusions for non-color-image data
      - Diffusions basically don't work if the denoising model is not a U-net. I'm not sure why, but this can be a concern depending on your wants. VAEs are much free-er.

      • 2 months ago
        Anonymous

        >- Diffusions basically don't work if the denoising model is not a U-net. I'm not sure why, but this can be a concern depending on your wants. VAEs are much free-er.
        so do they work on non-image data? idk Unet

        • 2 months ago
          Anonymous

          Basically they don't work on anything that's not specifically color image data. I highly suspect it's because of the denoising model coupled with the noise schedule, but I haven't found anything that works. I went through all the usual -- d3pm, vq-ddpm, LSGM and LDM using discrete decoders -- but I couldn't get any of them to work (or at least work acceptably) on non-trivial data (to be fair it's also what some of those papers show, though they don't "admit" it, d3pm comes to mind in that regard).

          UNet is currently the most popular model for image tasks like img2img, style transfer, etc. It's really just convolutional ladder networks, rebranded (unfortunately a common theme in modern ML "research"). This means it's a standard convnet heading toward a small bottleneck, then a standard deconvolution network, but there are connections laterally between the conv and deconv layers. It's very specific to 2D data with a stronger relationship structure around neighboring datapoints (like 2D images).

          • 2 months ago
            Anonymous

            interesting thanks

      • 2 months ago
        Anonymous

        Interesting. I'm looking into terrain synthesis, so it's greyscale image data conditioned on a latent from another model that represents large-scale structure. I'll give VAEs another look, then. Neural terrain synthesis literature is really poor so far as I've seen, so it's been an exercise in creatively applying more general image synthesis lit. Thanks for the writeup, scholar-anon.

        • 2 months ago
          Anonymous

          Give both a try and see how it goes. VAEs are simple enough to just do yourself, and you can use the stability LDM code as a base for the diffusion so you don't have to fiddle too much with the equation details (it's pytorch_lightning, which means you have to expect spaghetti, but it's easy enough to follow and modify for your own use).
          Good luck anon.

          • 2 months ago
            Anonymous

            Yeah, I was hoping to copy/paste their code. Might still be able to if I duplicate the intensity across all color channels.
            >GL
            Thanks, I'm really hoping to "solve" terrain generation with this work. Fucking tired of perlin noise.

Your email address will not be published. Required fields are marked *