UNDERSTANDING DIFFUSION MODELS : HOW AI REALLY CREATES IMAGE FROM NOISE

 UNDERSTANDING DIFFUSION MODELS : HOW AI REALLY CREATES IMAGE FROM NOISE

CONCEPT BEHIND THIS

Imagine you take a clear image and add Gaussian noise to it

( But what is Gaussian noise? 

Imagine you take a clean, sharp photo.

Now, imagine adding a tiny bit of random fuzz — like grain, tiny dots, or static — all over it. That’s noise.

Now, Gaussian noise is just a special kind of noise where:

  • Most of the fuzz is small and not very strong.

  • A few spots have slightly more intense fuzz. )  
the left side shows a clear coherent image while the left side shows an image with Gaussian noise.

We continue this process till the point we get an image where only tiny dots and grain (also called static or noise) are visible, and the image becomes unrecognizable.
What if we could reverse this process to start from the pure static and noise form and make our way back to the clear and coherent image? 
 
Left shows the pure noise and static image while right shoes the clear and coherent image generated from the pure noise image.

That friends is how diffusion models like Chat GPT's DALLE generate images.



DIFFUSION MODELS


Diffusion models are becoming very popular in AI.
They are especially good at creating images from scratch — like making a photo that looks real, even though it was never taken.
They’re now catching up to and even beating older AI models like GANs (Generative Adversarial Networks) in some tasks.

For example:

  • They create better-looking images (higher visual quality).

  • They can do smart tasks like:

    • Turning text into pictures (e.g., "a cat wearing sunglasses on the moon").

    • Fixing missing parts of images (called inpainting).

    • Changing parts of an image while keeping the rest the same (image manipulation).



EXAMPLE ON HOW THE PROCESS WORKS
We start from an image we name X₀. Now let us define out forward diffusion process ( clear image to static image ) in a way that static is added to the original image over Xₜ. Our AI model will be tasked to convert the image from Xₜ ( static image ) to X₀ ( clear image ) by the process of reverse diffusion


You can think of the forward process as adding ever more static to a picture of an image to the point that the image is unrecognizable.
But there's a neat trick: this adding of noise follows a process. And every step depends only on the previous step and one other step, it is like a chain.
Fancy name: This is known as a Markov chain -- Don't worry, it just means: "one step at a time".


Markov-Chain diagram
(Imagine watching the weather change daily — Sunny, Cloudy, or Rainy. This Markov chain shows how tomorrow’s weather depends only on today’s, not the day before. For example, from Sunny, there’s an 80% chance it stays Sunny, 10% Cloudy, 10% Rainy. Similarly, in diffusion models, noise is added step by step — each step depending only on the previous one. Like weather transitions, the model de-noises one small step at a time, not all at once.)

Therefore: X is a clean image. Add some noise, you get X₁. Add more noise, you get X₂. Add more noise, we continue until we get Xₜ, which is pure noise. In other words, by the end, the image becomes complete randomness. Maybe like the static of a TV -- we cannot place what the original image was.
Now starts the reverse process Now think of doing the opposite. The AI starts with pure noise and we're asking it: "Oh hey, can you figure out what this might have been -- and bring it back, step by step?"
The AI model then tries to sequentially remove only a little bit of noise at a time. If it's really smart, it can keep doing this until it gets its final image as very clear.

X₀ → X₁ → X₂ → .. → Xₜ (noise image)

Similarly
Xₜ (noise) → Xₜ₋₁ → Xₜ₋₂ → ... → X₀ (clear image)

You might ask, “Why not just jump from noise to image in one shot?”

Because small steps are easier to undo.
Think of it like unmixing paint — if you mix just a little color in, you can probably figure out how to get it back. But if you dump in all the paint at once, you’re stuck.

So the forward process adds tiny amounts of fuzz each time.
And that makes the reverse process much easier for the model to learn.

What is the Model Really Learning?

At a given step, the model's job is to guess what noise was added, and remove it. It learns a pattern like:

"At step 497, the noise usually looks like this, so I'll subtract that."

You can see this shows that the model isn't creating the original image pixel for pixel. Instead, it learns to slowly de-noise at every step.



What Happens When It’s Generating New Images?

When we want to make a brand-new image, here’s what happens:

  1. Start with random noise.

  2. Use the trained model to remove a bit of noise.

  3. Repeat many times (usually ~1000 steps).

  4. Boom — a clear, new, never-before-seen image appears.



CONCLUSION
Diffusion models are considered one of the most powerful AI image generation tools available at this time. They basically begin with pure noise and, over a series of steps, remove the noise until the AI produces beautiful, realistic images, including text based prompts like "a cat playing guitar on Mars."
Unlike previous image generation models like GANs, diffusion models provide:
  • Higher image quality
  • More control
  • Added capability for editing, inpainting, and text-to-image
  • Despite the fact the concept may sound complicated under the hood, the idea is really simple:
  • Take a photo. turn it to noise and teach AI to un-noise the image.

PS : if you liked the blog, do follow us on instagram by scanning the qr code given below or going to @algorythmvault


Comments

Popular posts from this blog

CODING ALONE WONT MAKE YOU RICH : HERE IS WHAT WILL