Generative AI - Implementing a CycleGAN for Image Translation

Implementing a CycleGAN for Image Translation

1. Import Libraries

import tensorflow as tf
from tensorflow.keras import layers, models
import numpy as np
import matplotlib.pyplot as plt

To start, we import TensorFlow and Keras to build and train our models, as well as NumPy to work with data and Matplotlib to see the pictures that were made.

2. Load and Preprocess the Data

dataset, metadata = tfds.load('cycle_gan/horse2zebra', with_info=True, as_supervised=True)
train_horses, train_zebras = dataset['trainA'], dataset['trainB']

def preprocess_image(image, label):
    image = tf.image.resize(image, [286, 286])
    image = tf.image.random_crop(image, size=[256, 256, 3])
    image = tf.image.random_flip_left_right(image)
    image = (image / 127.5) - 1
    return image

train_horses = train_horses.map(preprocess_image).batch(1)
train_zebras = train_zebras.map(preprocess_image).batch(1)

We use TensorFlow Datasets to load the horse2zebra dataset. The pictures are enlarged, cropped at random to 256x256 pixels, and made to be normal across the [-1, 1] range. Random horizontal flips are also used to add to the training data.

3. Define the Generators

def build_generator():
    inputs = layers.Input(shape=[256, 256, 3])
    down1 = layers.Conv2D(64, 4, strides=2, padding='same')(inputs)
    down1 = layers.LeakyReLU()(down1)

    down2 = layers.Conv2D(128, 4, strides=2, padding='same')(down1)
    down2 = layers.InstanceNormalization()(down2)
    down2 = layers.LeakyReLU()(down2)

    up1 = layers.Conv2DTranspose(128, 4, strides=2, padding='same')(down2)
    up1 = layers.InstanceNormalization()(up1)
    up1 = layers.ReLU()(up1)

    up2 = layers.Conv2DTranspose(64, 4, strides=2, padding='same')(up1)
    up2 = layers.InstanceNormalization()(up2)
    up2 = layers.ReLU()(up2)

    outputs = layers.Conv2D(3, 7, padding='same', activation='tanh')(up2)
    return models.Model(inputs, outputs)

A set of convolutional and inverted convolutional layers make up the generator model. Normalization of instances and activations of the ReLU help keep training stable. A tanh activation is used in the last layer to make outputs in the [-1, 1] band.

4. Define the Discriminators

def build_discriminator():
    inputs = layers.Input(shape=[256, 256, 3])
    down1 = layers.Conv2D(64, 4, strides=2, padding='same')(inputs)
    down1 = layers.LeakyReLU()(down1)

    down2 = layers.Conv2D(128, 4, strides=2, padding='same')(down1)
    down2 = layers.InstanceNormalization()(down2)
    down2 = layers.LeakyReLU()(down2)

    outputs = layers.Conv2D(1, 4, padding='same')(down2)
    return models.Model(inputs, outputs)

The discriminator model is a type of convolutional network that tells the difference between real and fake pictures. To make things run faster, it uses LeakyReLU activations and instance normalization.

5. Compile the Models

generator_g = build_generator()
generator_f = build_generator()
discriminator_x = build_discriminator()
discriminator_y = build_discriminator()

generator_g.compile(optimizer=tf.keras.optimizers.Adam(2e-4, beta_1=0.5), loss='mse')
generator_f.compile(optimizer=tf.keras.optimizers.Adam(2e-4, beta_1=0.5), loss='mse')
discriminator_x.compile(optimizer=tf.keras.optimizers.Adam(2e-4, beta_1=0.5), loss='mse')
discriminator_y.compile(optimizer=tf.keras.optimizers.Adam(2e-4, beta_1=0.5), loss='mse')

We use the Adam algorithm and mean squared error loss to help us build the generators and discriminators. This setup helps keep the mechanics of training fixed.

6. Training the CycleGAN

def train_cyclegan(epochs, batch_size=1):
    for epoch in range(epochs):
        for image_x, image_y in tf.data.Dataset.zip((train_horses, train_zebras)):
            with tf.GradientTape(persistent=True) as tape:
                fake_y = generator_g(image_x, training=True)
                fake_x = generator_f(image_y, training=True)

                cycle_x = generator_f(fake_y, training=True)
                cycle_y = generator_g(fake_x, training=True)

                disc_real_x = discriminator_x(image_x, training=True)
                disc_real_y = discriminator_y(image_y, training=True)
                disc_fake_x = discriminator_x(fake_x, training=True)
                disc_fake_y = discriminator_y(fake_y, training=True)

                gen_g_loss = tf.reduce_mean(tf.square(disc_fake_y - 1.0))
                gen_f_loss = tf.reduce_mean(tf.square(disc_fake_x - 1.0))

                cycle_loss = tf.reduce_mean(tf.abs(image_x - cycle_x)) + tf.reduce_mean(tf.abs(image_y - cycle_y))
                total_gen_g_loss = gen_g_loss + cycle_loss
                total_gen_f_loss = gen_f_loss + cycle_loss

                disc_x_loss = (tf.reduce_mean(tf.square(disc_real_x - 1.0)) + tf.reduce_mean(tf.square(disc_fake_x))) / 2.0
                disc_y_loss = (tf.reduce_mean(tf.square(disc_real_y - 1.0)) + tf.reduce_mean(tf.square(disc_fake_y))) / 2.0

            gradients_g = tape.gradient(total_gen_g_loss, generator_g.trainable_variables)
            gradients_f = tape.gradient(total_gen_f_loss, generator_f.trainable_variables)
            gradients_disc_x = tape.gradient(disc_x_loss, discriminator_x.trainable_variables)
            gradients_disc_y = tape.gradient(disc_y_loss, discriminator_y.trainable_variables)

            generator_g.optimizer.apply_gradients(zip(gradients_g, generator_g.trainable_variables))
            generator_f.optimizer.apply_gradients(zip(gradients_f, generator_f.trainable_variables))
            discriminator_x.optimizer.apply_gradients(zip(gradients_disc_x, discriminator_x.trainable_variables))
            discriminator_y.optimizer.apply_gradients(zip(gradients_disc_y, discriminator_y.trainable_variables))

        print(f'Epoch: {epoch}, Generator G Loss: {total_gen_g_loss.numpy()}, Generator F Loss: {total_gen_f_loss.numpy()}')

train_cyclegan(epochs=100)

The training loop is controlled by the train_cyclegan function. It uses gradient descent to update the generators and discriminators every time. Translations are useful and correct because of the cycle consistency loss, and pictures that are made are lifelike because of the discriminators. The producers are taught how to trick the discriminators, and the discriminators are taught how to tell the difference between real and fake pictures.

You can set up a CycleGAN to do single image-to-image translation by following these steps. This method makes it possible to change images in flexible and useful ways without using paired datasets. It opens up a lot of new options in computer vision and image processing.

logo

Generative AI

Beginner 5 Hours

Implementing a CycleGAN for Image Translation

1. Import Libraries

import tensorflow as tf
from tensorflow.keras import layers, models
import numpy as np
import matplotlib.pyplot as plt

To start, we import TensorFlow and Keras to build and train our models, as well as NumPy to work with data and Matplotlib to see the pictures that were made.

2. Load and Preprocess the Data

dataset, metadata = tfds.load('cycle_gan/horse2zebra', with_info=True, as_supervised=True)
train_horses, train_zebras = dataset['trainA'], dataset['trainB']

def preprocess_image(image, label):
    image = tf.image.resize(image, [286, 286])
    image = tf.image.random_crop(image, size=[256, 256, 3])
    image = tf.image.random_flip_left_right(image)
    image = (image / 127.5) - 1
    return image

train_horses = train_horses.map(preprocess_image).batch(1)
train_zebras = train_zebras.map(preprocess_image).batch(1)

We use TensorFlow Datasets to load the horse2zebra dataset. The pictures are enlarged, cropped at random to 256x256 pixels, and made to be normal across the [-1, 1] range. Random horizontal flips are also used to add to the training data.

3. Define the Generators

def build_generator():
    inputs = layers.Input(shape=[256, 256, 3])
    down1 = layers.Conv2D(64, 4, strides=2, padding='same')(inputs)
    down1 = layers.LeakyReLU()(down1)

    down2 = layers.Conv2D(128, 4, strides=2, padding='same')(down1)
    down2 = layers.InstanceNormalization()(down2)
    down2 = layers.LeakyReLU()(down2)

    up1 = layers.Conv2DTranspose(128, 4, strides=2, padding='same')(down2)
    up1 = layers.InstanceNormalization()(up1)
    up1 = layers.ReLU()(up1)

    up2 = layers.Conv2DTranspose(64, 4, strides=2, padding='same')(up1)
    up2 = layers.InstanceNormalization()(up2)
    up2 = layers.ReLU()(up2)

    outputs = layers.Conv2D(3, 7, padding='same', activation='tanh')(up2)
    return models.Model(inputs, outputs)

A set of convolutional and inverted convolutional layers make up the generator model. Normalization of instances and activations of the ReLU help keep training stable. A tanh activation is used in the last layer to make outputs in the [-1, 1] band.

4. Define the Discriminators

def build_discriminator():
    inputs = layers.Input(shape=[256, 256, 3])
    down1 = layers.Conv2D(64, 4, strides=2, padding='same')(inputs)
    down1 = layers.LeakyReLU()(down1)

    down2 = layers.Conv2D(128, 4, strides=2, padding='same')(down1)
    down2 = layers.InstanceNormalization()(down2)
    down2 = layers.LeakyReLU()(down2)

    outputs = layers.Conv2D(1, 4, padding='same')(down2)
    return models.Model(inputs, outputs)

The discriminator model is a type of convolutional network that tells the difference between real and fake pictures. To make things run faster, it uses LeakyReLU activations and instance normalization.

5. Compile the Models

generator_g = build_generator()
generator_f = build_generator()
discriminator_x = build_discriminator()
discriminator_y = build_discriminator()

generator_g.compile(optimizer=tf.keras.optimizers.Adam(2e-4, beta_1=0.5), loss='mse')
generator_f.compile(optimizer=tf.keras.optimizers.Adam(2e-4, beta_1=0.5), loss='mse')
discriminator_x.compile(optimizer=tf.keras.optimizers.Adam(2e-4, beta_1=0.5), loss='mse')
discriminator_y.compile(optimizer=tf.keras.optimizers.Adam(2e-4, beta_1=0.5), loss='mse')

We use the Adam algorithm and mean squared error loss to help us build the generators and discriminators. This setup helps keep the mechanics of training fixed.

6. Training the CycleGAN

def train_cyclegan(epochs, batch_size=1):
    for epoch in range(epochs):
        for image_x, image_y in tf.data.Dataset.zip((train_horses, train_zebras)):
            with tf.GradientTape(persistent=True) as tape:
                fake_y = generator_g(image_x, training=True)
                fake_x = generator_f(image_y, training=True)

                cycle_x = generator_f(fake_y, training=True)
                cycle_y = generator_g(fake_x, training=True)

                disc_real_x = discriminator_x(image_x, training=True)
                disc_real_y = discriminator_y(image_y, training=True)
                disc_fake_x = discriminator_x(fake_x, training=True)
                disc_fake_y = discriminator_y(fake_y, training=True)

                gen_g_loss = tf.reduce_mean(tf.square(disc_fake_y - 1.0))
                gen_f_loss = tf.reduce_mean(tf.square(disc_fake_x - 1.0))

                cycle_loss = tf.reduce_mean(tf.abs(image_x - cycle_x)) + tf.reduce_mean(tf.abs(image_y - cycle_y))
                total_gen_g_loss = gen_g_loss + cycle_loss
                total_gen_f_loss = gen_f_loss + cycle_loss

                disc_x_loss = (tf.reduce_mean(tf.square(disc_real_x - 1.0)) + tf.reduce_mean(tf.square(disc_fake_x))) / 2.0
                disc_y_loss = (tf.reduce_mean(tf.square(disc_real_y - 1.0)) + tf.reduce_mean(tf.square(disc_fake_y))) / 2.0

            gradients_g = tape.gradient(total_gen_g_loss, generator_g.trainable_variables)
            gradients_f = tape.gradient(total_gen_f_loss, generator_f.trainable_variables)
            gradients_disc_x = tape.gradient(disc_x_loss, discriminator_x.trainable_variables)
            gradients_disc_y = tape.gradient(disc_y_loss, discriminator_y.trainable_variables)

            generator_g.optimizer.apply_gradients(zip(gradients_g, generator_g.trainable_variables))
            generator_f.optimizer.apply_gradients(zip(gradients_f, generator_f.trainable_variables))
            discriminator_x.optimizer.apply_gradients(zip(gradients_disc_x, discriminator_x.trainable_variables))
            discriminator_y.optimizer.apply_gradients(zip(gradients_disc_y, discriminator_y.trainable_variables))

        print(f'Epoch: {epoch}, Generator G Loss: {total_gen_g_loss.numpy()}, Generator F Loss: {total_gen_f_loss.numpy()}')

train_cyclegan(epochs=100)

The training loop is controlled by the train_cyclegan function. It uses gradient descent to update the generators and discriminators every time. Translations are useful and correct because of the cycle consistency loss, and pictures that are made are lifelike because of the discriminators. The producers are taught how to trick the discriminators, and the discriminators are taught how to tell the difference between real and fake pictures.

You can set up a CycleGAN to do single image-to-image translation by following these steps. This method makes it possible to change images in flexible and useful ways without using paired datasets. It opens up a lot of new options in computer vision and image processing.

Frequently Asked Questions for Generative AI

Sequence of prompts stored as linked records or documents.

It helps with filtering, categorization, and evaluating generated outputs.



As text fields, often with associated metadata and response outputs.

Combines keyword and vector-based search for improved result relevance.

Yes, for storing structured prompt-response pairs or evaluation data.

Combines database search with generation to improve accuracy and grounding.

Using encryption, anonymization, and role-based access control.

Using tools like DVC or MLflow with database or cloud storage.

Databases optimized to store and search high-dimensional embeddings efficiently.

They enable semantic search and similarity-based retrieval for better context.

They provide organized and labeled datasets for supervised trainining.



Track usage patterns, feedback, and model behavior over time.

Enhancing model responses by referencing external, trustworthy data sources.

They store training data and generated outputs for model development and evaluation.

Removing repeated data to reduce bias and improve model generalization.

Yes, using BLOB fields or linking to external model repositories.

With user IDs, timestamps, and quality scores in relational or NoSQL databases.

Using distributed databases, replication, and sharding.

NoSQL or vector databases like Pinecone, Weaviate, or Elasticsearch.

With indexing, metadata tagging, and structured formats for efficient access.

Text, images, audio, and structured data from diverse databases.

Yes, for representing relationships between entities in generated content.

Yes, using structured or document databases with timestamps and session data.

They store synthetic data alongside real data with clear metadata separation.



line

Copyrights © 2024 letsupdateskills All rights reserved