1. Introduction

Let's start by understanding the diffusion model's main goal, which is to capture how information spreads through a system over time.

We'll explore how to make the most of our training data, which consists of different sprites representing distinct elements.

Finally, we'll delve into training the diffusion models themselves using this data, refining their ability to simulate and predict the spread of information.

1.1 What is the goal of diffusion models?

You possess a substantial amount of training data, including sprite images like the ones displayed here, portraying video game characters. However, the goal is to expand this dataset with additional sprites that currently aren't part of it. This expansion can be achieved through a neural network, which has the capability to generate more sprites by adhering to the diffusion model methodology.

Untitled

This collection constitutes your training dataset.

1.2 Making Images Useful for the Neural Network

Untitled

The neural network needs to learn about sprites in detail, from small things like hair color to bigger aspects like body shape. To help with this, we can use a technique called "noising." It means adding different amounts of fuzziness to images. This helps the network learn both the small and big features of sprites, making it better at understanding them.

<aside> 💡 Think about dropping ink into a glass of water. At first, you can pinpoint where the ink landed. However, with time, the ink spreads out in the water until it vanishes. This notion guides our approach here. Just like starting with "Bob the Sprite," as we introduce noise, the sprite gradually fades until it's hard to tell which one it originally was. This principle helps the neural network understand the full spectrum of sprite details.

</aside>

1.3 What should Neural Network Think?

  1. Clear Sprite (Bob the Sprite): The network confidently confirms, "Yes, that's Bob the Sprite," maintaining the original.
  2. Slight Noise (Possibly Bob): Recognizing Bob with minor noise, it suggests changes to match, thinking, "Possibly Bob, with noise. Adjust details."
  3. General Outline (Possibly a Sprite Person): Identifying a sprite person, it proposes general features for Bob, Fred, or Nancy, adapting to uncertainty.

Untitled

  1. Extreme Noise (Almost Unrecognizable): Amid heavy distortion, it strives to outline sprite-like traits, thinking, "Enhance into rough sprite outline."

Now, let's explore how the process of gradually adding noise unfolds over time.

Untitled

Untitled

Untitled

Untitled

Now, let's delve into training the neural network. The aim is to teach it how to transform different noisy images back into recognizable sprites. This process involves removing the added noise, progressing from complete noise to a semblance of a person, then to a sprite resembling Fred.

Untitled

Untitled

The crucial "No Idea" noise level is vital; it adheres to a normal distribution, where each pixel is sampled from a bell-shaped curve, known as a Gaussian distribution. This aspect is pivotal for effective training.

2. Sampling

Before discussing training methods, let's explore how we utilize the neural network during inference. Here's the process:

  1. You input the noise sample into the trained neural network. This network has learned the essence of sprites.
  2. The neural network predicts noise instead of the sprite. This prediction is subtracted from the noise sample.
  3. This subtraction yields a slightly more sprite-like result. However, it's important to note that this is a noise prediction and doesn't completely eliminate noise.
  4. To achieve high-quality samples, multiple iterations are necessary. After around 500 iterations, a significantly sprite-like outcome is attainable.

Untitled

2.1 Imports