Let's start by understanding the diffusion model's main goal, which is to capture how information spreads through a system over time.
We'll explore how to make the most of our training data, which consists of different sprites representing distinct elements.
Finally, we'll delve into training the diffusion models themselves using this data, refining their ability to simulate and predict the spread of information.
You possess a substantial amount of training data, including sprite images like the ones displayed here, portraying video game characters. However, the goal is to expand this dataset with additional sprites that currently aren't part of it. This expansion can be achieved through a neural network, which has the capability to generate more sprites by adhering to the diffusion model methodology.
This collection constitutes your training dataset.
The neural network needs to learn about sprites in detail, from small things like hair color to bigger aspects like body shape. To help with this, we can use a technique called "noising." It means adding different amounts of fuzziness to images. This helps the network learn both the small and big features of sprites, making it better at understanding them.
<aside> 💡 Think about dropping ink into a glass of water. At first, you can pinpoint where the ink landed. However, with time, the ink spreads out in the water until it vanishes. This notion guides our approach here. Just like starting with "Bob the Sprite," as we introduce noise, the sprite gradually fades until it's hard to tell which one it originally was. This principle helps the neural network understand the full spectrum of sprite details.
</aside>
Now, let's explore how the process of gradually adding noise unfolds over time.
Now, let's delve into training the neural network. The aim is to teach it how to transform different noisy images back into recognizable sprites. This process involves removing the added noise, progressing from complete noise to a semblance of a person, then to a sprite resembling Fred.
The crucial "No Idea" noise level is vital; it adheres to a normal distribution, where each pixel is sampled from a bell-shaped curve, known as a Gaussian distribution. This aspect is pivotal for effective training.
Before discussing training methods, let's explore how we utilize the neural network during inference. Here's the process: