What is a Generative Adversarial Network? An Easy Guide
Artificial intelligence (AI) is going through a bit of a Cambrian explosion. What seemed like science fiction only a short time ago has barreled toward reality over the last few years. And interest is only growing thanks to the release of tools like ChatGPT. But just like the Cambrian explosion, what's coming out of the AI evolutionary leap isn't just one thing. There are actually many fronts where we're advancing AI software all at the same time.
In this post, we're going to talk about one of those types of AI: generative adversarial networks, an effective AI model used for a variety of task types.
Discriminative vs. Generative
Just like we classify animal fossils into domains, kingdoms, and phyla, we classify AI networks, too. At the highest level, we classify AI networks as "discriminative" and "generative." A generative neural network is an AI that creates something new. This differs from a discriminative network, which classifies something that already exists into particular buckets. Kind of like we're doing right now, by bucketing generative adversarial networks (GANs) into appropriate classifications.
So, if you were in a situation where you wanted to use textual tags to create a new visual image, like with Midjourney, you'd use a generative network. However, if you had a giant pile of data that you needed to classify and tag, you'd use a discriminative model.
Where Does the Generative Adversarial Network Fall?
The name implies that a GAN is generative. But that second word there is actually important, too. A GAN is a clever application of both discriminative and generative models. A GAN model will train with an initial group of classified training data. Then the generative model will attempt to generate new instances that it thinks fit into the model. As a second round, the discriminative model will receive both generated content and training set content. It will attempt to classify the content as either generated or part of the training set.
This is the adversarial part of the GAN acronym. The generative portion of the model keeps trying to create better and better-generated content until it can reliably "fool" the discriminative model. The fundamental goal of a GAN is to train a generative model that generates high-quality content, but it uses both generative and discriminative models to get there.
Supervised vs. Unsupervised
After our domain classification of discriminative versus generative, our next classification bucket is whether a model is "supervised" or "unsupervised." This is a classification that can be a little more cloudy. While the supervised/unsupervised classification is instructive, many types of AI models will utilize a hybrid approach. A supervised model, as the name implies, generally involves human supervision during the training phase. The training phase for a supervised model involves multiple rounds of initial inputs and desired outputs.
Unsupervised models, as the name implies, operate with less or no human input. The goal is for the model to take an original training set and undergo multiple rounds of increasingly improving training in order to generate better content after each round.
Which Type Is a Generative Adversarial Network?
A GAN is an unsupervised AI model. The back-and-forth style of the adversarial training cycles is designed to continuously improve the generated content. For a developer working with a GAN, you'll need to provide high-quality classifications of your initial data training set, and provide the computing power to run multiple iterations. But once you've provided the requisite resources, you can run the GAN with minimal human interaction.
Generative Adversarial Network vs. Convolutional Neural Network
A common question for new developers working with a GAN is how they relate to a convolutional neural network (CNN). A CNN is a type of discriminative model that's used to classify data based on existing labeled content. For many GANs, they'll use a CNN as their discriminative model that challenges the generated content in the first start of the training cycle. CNNs are very often used to classify images for use in computer image recognition architectures.
CNNs are sometimes used as the generative step in the GAN architecture, as well. Using a CNN to generate content as the output of the generational step usually requires adding additional logic like a variational autoencoder rather than using a traditional CNN. When you use this tool in combination with a traditional CNN, you can use a CNN for both the generational and adversarial steps of the GAN training process.
Frequently Asked Questions
Now that we have a pretty good idea of what a GAN is and what it does, let's address a handful of frequently asked questions about GANs and why you might choose one for your project.
What Are Some Common Usages of GANs?
One of the major benefits of a GAN is that it can generate realistic examples of a wide variety of content. Given the right kind of training data, you can use text-based prompts to generate any of the following kinds of content:
- Images
- Audio content (like music)
- Converting black-and-white images to color
- Converting from hand-drawn sketches to photorealistic representations
- Predicting future frames of video based on previous frames
- Create deepfakes
Why Would You Choose a Generative Adversarial Network?
We've talked a lot about how GANs are classified, how they work, and what they can do. But we haven't answered a simple question: why would you choose to use a GAN? One reason you might use a GAN is when you're dealing with problems where you don't have a large set of data to use for training. The nature of a GAN is that you can work with a limited amount of training data, then you can let the GAN generate more data that are fed back into the training routine. Many machine learning systems require that you train them on a considerable amount of pre-labeled data. When using a GAN, you don't have that requirement.
GANs are also particularly strong at working with and generating images. Whether that's creating new images from text prompts or updating an input image with generated content, GANs are exciting because they provide visually impressive results.
Why Avoid Using a Generative Adversarial Network?
If you're thinking about adopting a GAN, there are a few drawbacks that you should still consider. For starters, GANs aren't a good fit if you're trying to train your model on the cheap. GANs are particularly expensive to train, due to the multi-step training cycle and the need to undergo multiple rounds of training. So, if you're trying to adhere to a strict budget, adopting a GAN might not be a good choice.
Additionally, GANs sometimes undergo what's known as mode collapse, where the content that the output is a limited subset of the training data, and not the range of variety you're hoping for.
GANs Provide Great Utility When They Fit
To loop back around to our evolutionary metaphor, a GAN isn't a crab or a turtle. It's not an animal that you'll see still filling the same role years from now. Instead, GANs are evolving. Researchers are still figuring out what GANs can do, and how to best design and train them. If you're thinking about adopting machine learning for your workflows, researching GANs makes a lot of sense, but they might not be the right tool for your job. If they don't seem like it, then you should avoid trying to jam them into a job where they don't fit. But if you're doing work that fits a GAN, especially image manipulation or creation, then adopting and training a GAN will likely provide big benefits for your workflows.
Check out Zilliz, a cost-effective vector database algorithm.