Generative AI - Practical Coding: Implementing VAEs

Practical Coding: Implementing VAEs

Implementing VAEs

A Variational Autoencoder (VAE) is a type of generative model that learns how to store data in a hidden space and then retrieve it from that space. Here's how to use TensorFlow and Keras to make a simple VAE from scratch.

1. Import Libraries

import tensorflow as tf
from tensorflow.keras import layers, models
import numpy as np
import matplotlib.pyplot as plt

To build and train our VAE, we begin by loading TensorFlow and Keras. Matplotlib is used to see the produced and rebuilt images, and NumPy is used to change the data.

2. Define the Encoder

class Sampling(layers.Layer):
    def call(self, inputs):
        z_mean, z_log_var = inputs
        batch = tf.shape(z_mean)[0]
        dim = tf.shape(z_mean)[1]
        epsilon = tf.keras.backend.random_normal(shape=(batch, dim))
        return z_mean + tf.exp(0.5 * z_log_var) * epsilon

def build_encoder(latent_dim):
    inputs = layers.Input(shape=(28, 28, 1))
    x = layers.Conv2D(32, 3, activation='relu', strides=2, padding='same')(inputs)
    x = layers.Conv2D(64, 3, activation='relu', strides=2, padding='same')(x)
    x = layers.Flatten()(x)
    x = layers.Dense(16, activation='relu')(x)
    z_mean = layers.Dense(latent_dim, name='z_mean')(x)
    z_log_var = layers.Dense(latent_dim, name='z_log_var')(x)
    z = Sampling()([z_mean, z_log_var])
    return models.Model(inputs, [z_mean, z_log_var, z], name='encoder')

Sampling Layer: The reparameterization trick is carried out by the sampling layer. By mixing the mean (z_mean) and the standard deviation (z_log_var), it creates a sample z from the latent space. This keeps the network differentiable while it is being trained.

Encoder Model: This part, called the encoder model, shrinks the raw photos into a hidden space with fewer dimensions. It is the thick layers that find the mean and log-variance for the hidden space, while the convolutional layers pull out spatial data.

3. Define the Decoder

def build_decoder(latent_dim):
    latent_inputs = layers.Input(shape=(latent_dim,))
    x = layers.Dense(7 * 7 * 64, activation='relu')(latent_inputs)
    x = layers.Reshape((7, 7, 64))(x)
    x = layers.Conv2DTranspose(64, 3, strides=2, padding='same', activation='relu')(x)
    x = layers.Conv2DTranspose(32, 3, strides=2, padding='same', activation='relu')(x)
    outputs = layers.Conv2DTranspose(1, 3, padding='same', activation='sigmoid')(x)
    return models.Model(latent_inputs, outputs, name='decoder')

In this part, the original pictures are put back together from the hidden area. Upsampling the latent vector back to the original picture size is done with reversed convolutional layers. In the last layer, the output, the sigmoid activation brings the pixel values back to the range [0, 1].

4. Define the VAE Model

class VAE(tf.keras.Model):
    def __init__(self, encoder, decoder, **kwargs):
        super(VAE, self).__init__(**kwargs)
        self.encoder = encoder
        self.decoder = decoder

    def call(self, inputs):
        z_mean, z_log_var, z = self.encoder(inputs)
        reconstructed = self.decoder(z)
        kl_loss = -0.5 * tf.reduce_mean(
            z_log_var - tf.square(z_mean) - tf.exp(z_log_var) + 1)
        self.add_loss(kl_loss)
        return reconstructed

latent_dim = 2
encoder = build_encoder(latent_dim)
decoder = build_decoder(latent_dim)
vae = VAE(encoder, decoder)

Makes a single model out of the encoder and decoder. It finds the KL divergence loss, which shows how far away from the standard normal distribution the learned latent distribution is. The total loss, which is kept as low as possible during training, is made up of this loss and the repair loss.

5. Compile and Train the VAE

vae.compile(optimizer='adam', loss=tf.keras.losses.MeanSquaredError())
(x_train, _), (x_test, _) = tf.keras.datasets.mnist.load_data()
x_train = np.expand_dims(x_train, -1).astype('float32') / 255.0
x_test = np.expand_dims(x_test, -1).astype('float32') / 255.0
vae.fit(x_train, x_train, epochs=20, batch_size=64, validation_data=(x_test, x_test))

Compress: The Adam algorithm and the mean squared error loss function are used to compress the VAE. This setup checks that gradient-based optimization works well and finds the difference between the original and rebuilt images.

Data for Training: The MNIST file has been loaded and made regular. The model is then trained on this dataset 20 times, with 64 batches, and the test dataset is used to make sure it works.

logo

Generative AI

Beginner 5 Hours

Practical Coding: Implementing VAEs

Implementing VAEs

A Variational Autoencoder (VAE) is a type of generative model that learns how to store data in a hidden space and then retrieve it from that space. Here's how to use TensorFlow and Keras to make a simple VAE from scratch.

1. Import Libraries

import tensorflow as tf
from tensorflow.keras import layers, models
import numpy as np
import matplotlib.pyplot as plt

To build and train our VAE, we begin by loading TensorFlow and Keras. Matplotlib is used to see the produced and rebuilt images, and NumPy is used to change the data.

2. Define the Encoder

class Sampling(layers.Layer):
    def call(self, inputs):
        z_mean, z_log_var = inputs
        batch = tf.shape(z_mean)[0]
        dim = tf.shape(z_mean)[1]
        epsilon = tf.keras.backend.random_normal(shape=(batch, dim))
        return z_mean + tf.exp(0.5 * z_log_var) * epsilon

def build_encoder(latent_dim):
    inputs = layers.Input(shape=(28, 28, 1))
    x = layers.Conv2D(32, 3, activation='relu', strides=2, padding='same')(inputs)
    x = layers.Conv2D(64, 3, activation='relu', strides=2, padding='same')(x)
    x = layers.Flatten()(x)
    x = layers.Dense(16, activation='relu')(x)
    z_mean = layers.Dense(latent_dim, name='z_mean')(x)
    z_log_var = layers.Dense(latent_dim, name='z_log_var')(x)
    z = Sampling()([z_mean, z_log_var])
    return models.Model(inputs, [z_mean, z_log_var, z], name='encoder')

Sampling Layer: The reparameterization trick is carried out by the sampling layer. By mixing the mean (z_mean) and the standard deviation (z_log_var), it creates a sample z from the latent space. This keeps the network differentiable while it is being trained.

Encoder Model: This part, called the encoder model, shrinks the raw photos into a hidden space with fewer dimensions. It is the thick layers that find the mean and log-variance for the hidden space, while the convolutional layers pull out spatial data.

3. Define the Decoder

def build_decoder(latent_dim):
    latent_inputs = layers.Input(shape=(latent_dim,))
    x = layers.Dense(7 * 7 * 64, activation='relu')(latent_inputs)
    x = layers.Reshape((7, 7, 64))(x)
    x = layers.Conv2DTranspose(64, 3, strides=2, padding='same', activation='relu')(x)
    x = layers.Conv2DTranspose(32, 3, strides=2, padding='same', activation='relu')(x)
    outputs = layers.Conv2DTranspose(1, 3, padding='same', activation='sigmoid')(x)
    return models.Model(latent_inputs, outputs, name='decoder')

In this part, the original pictures are put back together from the hidden area. Upsampling the latent vector back to the original picture size is done with reversed convolutional layers. In the last layer, the output, the sigmoid activation brings the pixel values back to the range [0, 1].

4. Define the VAE Model

class VAE(tf.keras.Model):
    def __init__(self, encoder, decoder, **kwargs):
        super(VAE, self).__init__(**kwargs)
        self.encoder = encoder
        self.decoder = decoder

    def call(self, inputs):
        z_mean, z_log_var, z = self.encoder(inputs)
        reconstructed = self.decoder(z)
        kl_loss = -0.5 * tf.reduce_mean(
            z_log_var - tf.square(z_mean) - tf.exp(z_log_var) + 1)
        self.add_loss(kl_loss)
        return reconstructed

latent_dim = 2
encoder = build_encoder(latent_dim)
decoder = build_decoder(latent_dim)
vae = VAE(encoder, decoder)

Makes a single model out of the encoder and decoder. It finds the KL divergence loss, which shows how far away from the standard normal distribution the learned latent distribution is. The total loss, which is kept as low as possible during training, is made up of this loss and the repair loss.

5. Compile and Train the VAE

vae.compile(optimizer='adam', loss=tf.keras.losses.MeanSquaredError())
(x_train, _), (x_test, _) = tf.keras.datasets.mnist.load_data()
x_train = np.expand_dims(x_train, -1).astype('float32') / 255.0
x_test = np.expand_dims(x_test, -1).astype('float32') / 255.0
vae.fit(x_train, x_train, epochs=20, batch_size=64, validation_data=(x_test, x_test))

Compress: The Adam algorithm and the mean squared error loss function are used to compress the VAE. This setup checks that gradient-based optimization works well and finds the difference between the original and rebuilt images.

Data for Training: The MNIST file has been loaded and made regular. The model is then trained on this dataset 20 times, with 64 batches, and the test dataset is used to make sure it works.

Frequently Asked Questions for Generative AI

Sequence of prompts stored as linked records or documents.

It helps with filtering, categorization, and evaluating generated outputs.



As text fields, often with associated metadata and response outputs.

Combines keyword and vector-based search for improved result relevance.

Yes, for storing structured prompt-response pairs or evaluation data.

Combines database search with generation to improve accuracy and grounding.

Using encryption, anonymization, and role-based access control.

Using tools like DVC or MLflow with database or cloud storage.

Databases optimized to store and search high-dimensional embeddings efficiently.

They enable semantic search and similarity-based retrieval for better context.

They provide organized and labeled datasets for supervised trainining.



Track usage patterns, feedback, and model behavior over time.

Enhancing model responses by referencing external, trustworthy data sources.

They store training data and generated outputs for model development and evaluation.

Removing repeated data to reduce bias and improve model generalization.

Yes, using BLOB fields or linking to external model repositories.

With user IDs, timestamps, and quality scores in relational or NoSQL databases.

Using distributed databases, replication, and sharding.

NoSQL or vector databases like Pinecone, Weaviate, or Elasticsearch.

With indexing, metadata tagging, and structured formats for efficient access.

Text, images, audio, and structured data from diverse databases.

Yes, for representing relationships between entities in generated content.

Yes, using structured or document databases with timestamps and session data.

They store synthetic data alongside real data with clear metadata separation.



line

Copyrights © 2024 letsupdateskills All rights reserved