Autumn 2022 CSE490g1 Final Project: Generating with GANs

By Andy Danforth

Abstract

This project was meant to learn about and experiment with generative adversarial networks (GANs) while also producing some cool imagery. I knew early on in the quarter that I would want to work with image generation since the demos saw seemed super interesting. I actually started working on this project before we discussed them in lecture, so my first steps were simply learning about what GANs were and how they worked. I eventually settled on working with a Deep Convolutional GAN, and this was lucky since we just so happened to discuss these in lecture. Once I knew how GANs worked and had selected the model I would be working with, I explored how a my model would perform on two different domains of images: something straightforward like cat faces, and something more abstract like abstract art. I originally hypothesized that the generated results from on the straightforward dataset would be much better since the images it is learning from will be much more homogeneous. This was mostly the case, however I had issues with my abstract art dataset being too small which I will address further down.

Problem Statement

I originally wanted to perform a Pix2Pix model and run it on each frame of a video but I didn’t think I could create anything of decent quality in the little time I had, so I instead wanted to work with GANs since they are similar and a bit easier to digest. I wouldn’t say there is anything specific I’m trying to solve with this project, there are much smarter people with a lot more resources that can likely make near perfect generated images of any domain - I simply wanted to experiment with it myself. I hope to create some cool images purely with GANs!

Methodology

I mostly used a structure for a DCGAN that I found online (Linked in the sources). This structure was also nearly identical to the one shown in class. I changed the structure of the model in some preliminary experiments and couldn’t find a better performing model in my short time frame. Once I had this model, I found a few datasets off of Kaggle which I could use to train my model. From there, I adjusted some hyperparameters and simply ran the model and tracked the losses and output - which for most datasets produced good results. I had a lot of issues running on the more complicated datasets - which I will discuss more in the results section.

GAN Stuff

My goal of creating "cool" images is pretty subjective so I had a hard time quantifying my progress. Mostly, I would stop training once I thought the results looked interesting enough to me, or I would restart and change my model if I saw no progress being made. Of course I could also use the loss after each epoch for a more objective measure of progress, but I found that a lower loss didn’t completely correlate with a better output image.

The first dataset I trained with was a set of around 16,000 cat faces. These images are all 64x64 and pretty well cropped. The original DCGAN design worked well with human faces, so I figured I'd try it out with feline ones. Here are a few example images:

cat 1
cat 2
cat 3

First Attempt

On my first attempt, I did not change any of the hyperparameters for the model, and simply switched the dataloader to load in the cat faces. After the first 5 epochs, these were the faces my generator was creating:

cat 2

Vaugely cat-like! The eyes all appear to be in the same spot and all of the head shapes appear to be about the same. Only the colors really vary.

I let training run for another 10 epochs and found that the time between epochs was getting a lot shorter. I also noticed that the cat faces were not changing as much as they were at the start. For reference, here are the results before and after the first epoch:

Before 1 epoch

Before 1 epoch

After 1 epoch

After 1 epoch

Notice how much more drastically different the results are after the first epoch compared to the results from the 1st to the 5th epoch. I hypothesize that since there is less changing in the images, that there also must be a lot less back propagation after each epoch which would explain the large decrease in time between epochs.

Then I ran the GAN for another 40 epochs. Here are the results after a total of 60 training epochs:

after 60 epochs

The generated images are a lot sharper and more diverse. However, there are still a few images with clear artifacts, like the cat in the first row and sixth column.

And here are the discriminator and generator loss curves across these epochs:

loss 60 epochs

I was using Adam as my optimizer, so its stochastic nature would explain the seemingly random spikes in loss.

Adding Annealing

I ran my model more and my results didn't seem to be getting much better, so I decided to simulate annealing in an attempt to manually lower the learning rate. I just decreased the discriminator 's and generator's optimizer's learning rate by a factor of 10.

Here is the resulting loss curve after 10 annealed epochs:

Anneal loss

Since we decreased the learning rate by a large factor, the variance in our loss decreased as shown in the much tighter results. However, I noticed the generators loss was actually trending UPWARDS? We'll look at that more later. In the mean time, here are the annealed cat results after 10 and 20 annealed epochs:

10 annealed epochs

10 annealed epochs

20 annealed epochs

20 annealed epochs

As you can see, after adding annealing, there was a drastic change from the non annealed model, which I think increased the quality of cats produced, but then very little change occured after the fact. There ARE differences between these images, it's not just duplicated :).

Can you guess which of these images was generated and which was real?

Click to reveal! The cat on the left is fake!!!!

Abstract Art - First Dataset

I felt pretty happy with the cat model, and so I tried to think of a new type of image I could generate. I settled on abstract art since I felt it would be an entirely different kind of image. The cat faces were all very similar poses, and shapes and colors, no two abstract art pieces were the same. I found a dataset of abstract art pieces on Kaggle, but there were only around 2.7k and they were all very different sizes. Here are a few examples:

abs art 1 1
abs art 1 2
abs art 1 3

As you can see, they are all different shapes, sizes, resolutions, have different colors, etc. The dataloader automatically resizes and crops images, but because this dataset is so small, I was afraid my results would suffer.... They did

Here are some wonderful results after many epochs over the data!

discriminator  Overfit Examples

Absolute nonsense! And no matter how much I trained, the results would never get any better. And this can be explained by the loss graph:

discriminator  Overfit Loss

The generator loss was actually increasing by a lot while the discriminator was doing a great job. This actually was due to the fact that the discriminator was learning a lot faster than the generator. Because the discriminator was essentially overfit, it meant that the generator could never learn what was good or bad since no matter what they made, the discriminator could easily tell it was fake! This is likely what was happening with the annealing loss curve for the cat dataset. The discriminator was learning faster and making the generator have a higher and higher loss despite the resulting images not changing much. Note: I did and wrote all of this before the GANs lecture -_-

In order to fix this, I found that people usually found ways to "throw off" the discriminator in order to give the generator a chance to learn.

Adding Noise

The first thing I attempted to do was to add noise to some images in order to make it harder for the discriminator to tell an image from being real or fake. I used random Gaussian noise as follows:

Original Image

Small Amounts of Noise

Large Amounts of Noise

The added noise created the following loss graphs:

Small Noise

Large Noise

Using small amounts of noise actually had the effect of making the generator run better than the discriminator ! And large noise didn't seem to do anything. The results for both were still gibberish. I did not get to experiment with these results enough in my limited time frame sadly, so these results might not be as good as they could have been. For the sake of time, I scratched the noise idea and went with another.

One Sided Label Smoothing

From what I understand about one sided label smoothing, it essentially penalizes the discriminator for being too confident in its predictions. So if it is more than say 90% confident that an image is real, we will punish the descriminaor. This means it will hopefully give the generator a chance to learn.

I ran my model for 20 epochs with smoothing and the results were a lot better!

Loss Graph

Examples

The loss of the generator wasn't going upwards, and actaully appears to be trending down! The generated examples aren't anything special - the generator seems to like 3 specific examples, but at least it isnt complete garbage like before!

However, I realized that no matter what I did to assist the model, it wouldn't be able to help with the poor data set being used. "Garbage in, garbage out" as they say. So I decided to scrap working with this data set.

Abstract Art - Second Dataset

I was lucky enough to find another dataset for abstract art images, and this one looked a lot more promising. All of its images were 512x512 so I wouldn't have to worry about random crops or resizing losing information, and there were also over 8,000 images! Here are a few examples:

abs art 2 1
abs art 2 2
abs art 2 3

I didn't notice this until writing it up, but this dataset also is all (or mostly) painting or art whereas the first dataset had a lot of photos of structures and other things. This would also help my generator because it wouldn't be torn between trying to generate an abstract painting right after generating a picture of a building.

I decided to keep the one sided label smoothing for this model since I imagine the discriminator overfitting could still be an issue here with all of the different art pieces, but I did not include any blurring since I did not get it working properly previously. And after 10 epochs, the results were already promising:

Loss Graph

Examples

The loss graph was much better already compared to the previous dataset - the generator loss has a clear downward trend, and the discriminator doesn't seem to be overfitting too hard! The examples are also a lot more diverse. Of course there isn't anything majorly interesting here yet, but there is a LOT more diversity in its generated pieces when compared to the last dataset. Note how different randomly generated inputs result in many different shapes rather than just 3.

So I ran it for another 50 epochs and examined the loss (the generated results were lost :/)

And finally, for another 50 epochs (funnily enough, the loss graphs were not saved this time):

And wow! the results are amazing! They look pretty blotched and overexposed, but maybe it's just a new style?

Annealing Again

Now that I had a (maybe?) working model, I wanted to see if I could recreate the same results I got with annealing on the cat dataset. I used the same annealing process and ran for another 20 or so epochs:

The variance in the losses dropped significantly, and again, a small upward trend appeared in the generator loss curve. Then the generated results:

Right After the First Annealing Epoch

At The End of 25 Annealing Epochs

As you can see, there is once again a large jump from the end of non-annealed training and annealed training, but then after many annealed epochs, the results don't change drastically. Though, they do change more siginificantly than for the cat example.

Conclusion

Overall, I think I had pretty good results. The cat model was a great success and the second abstract model produced "interesting" art pieces at least.

I had a lot of fun with this project! It was probably a little scatter brained, but I learned a lot from thrashing around trying to get my models to work! I learned firsthand how important a dataset is for the goals of your model - I'm sure the first abstract art dataset is amazing in some regards, but the more formatted and larger second dataset was so much more useful. The model structure I used was the one presented in class and the one that the pytorch article used and any attempts I made to change it just performed the same or worse. Due to my lack of time, I chose to not change it and just explore other parts of the GAN process.

If I had more time, I would have liked to haved explored more fixes to discriminator overfitting as well as experimentation on other generator and discriminator structures. I only ever annealed both the generator and the loss, it would have been interesting to see how the results varied by annealing only one or the other. I would want to figure out how to add noise properly too. There are a lot of little experiments I could have added on. I also could have created a better website - haha

In the future, I'll choose a more specific goal or project idea. I feel like I had no real goal for this project, or any motiviation beyond my own interest - which made quantifying my results a little harder. But again, I learned a lot from this project, and generated a bunch of cool images so I'm happy!

Sources

Datasets

Other Materials

Code for website and project