Generative AI - Implementing Neural Style Transfer

Image Generation and Style Transfer

Implementing Neural Style Transfer

In neural style transfer, the style of one picture is transferred to another image while the information of the first image is kept. We are going to use PyTorch to build a simple neural style transfer method.

Step 1: Setup and Import Necessary Libraries

To begin, we need to download and add the required files.

# Install necessary libraries
!pip install torch torchvision pillow

# Import libraries
import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import transforms, models
from PIL import Image
import matplotlib.pyplot as plt

 

PyTorch, torchvision, and Pillow are all installed with pip.

It's necessary to import PyTorch and its neural network, optimizer, and models.

Image preparation uses transforms that we get from torchvision.

For working with picture files, we load pictures from PIL.

For showing pictures, we load matplotlib.

Step 2: Load and Preprocess Images

Images need to be styled and preprocessed before they can be loaded.

# Function to load and preprocess images
def load_image(image_path, max_size=400, shape=None):
    image = Image.open(image_path).convert('RGB')
    
    if max(image.size) > max_size:
        size = max_size
    else:
        size = max(image.size)
    
    if shape:
        size = shape
    
    in_transform = transforms.Compose([
        transforms.Resize(size),
        transforms.ToTensor(),
        transforms.Normalize((0.485, 0.456, 0.406), 
                            (0.229, 0.224, 0.225))])
    
    image = in_transform(image)[:3, :, :].unsqueeze(0)
    
    return image

# Load content and style images
content_image = load_image('path_to_content_image.jpg')
style_image = load_image('path_to_style_image.jpg', shape=content_image.shape[-2:])

We create the function load_image to get pictures and prepare them for use.

The code changes the image's size, turns it into a tensor, and makes it normal.

The load_image method is used to load the main image and the style picture.

Step 3: Define the Model

To get features from the pictures, we will use a VGG19 model that has already been trained.

# Load pre-trained VGG19 model
vgg = models.vgg19(pretrained=True).features

# Freeze parameters
for param in vgg.parameters():
    param.requires_grad_(False)

# Move the model to GPU if available
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
vgg.to(device)

# Define the content and style layers
content_layers = ['conv4_2']
style_layers = ['conv1_1', 'conv2_1', 'conv3_1', 'conv4_1', 'conv5_1']

We load the VGG19 model that has already been trained and set its settings to zero.

We choose the layers from which to get the style and content features.

If a GPU is available, we move the model there so that it can be computed faster.

Step 4: Extract Features

We need to take features out of the text and change the way pictures look.


# Function to get features
def get_features(image, model, layers):
    features = {}
    x = image
    for name, layer in model._modules.items():
        x = layer(x)
        if name in layers:
            features[name] = x
    return features

# Get content and style features
content_features = get_features(content_image.to(device), vgg, content_layers)
style_features = get_features(style_image.to(device), vgg, style_layers)



We create a function called get_features that pulls features from the model's levels that we tell it to.

We take traits from the text and style the pictures.

Step 5: Calculate the Style Loss

We need to use Gram matrices to figure out the style loss.

# Function to calculate Gram matrix
def gram_matrix(tensor):
    _, d, h, w = tensor.size()
    tensor = tensor.view(d, h * w)
    gram = torch.mm(tensor, tensor.t())
    return gram

# Get style Gram matrices
style_grams = {layer: gram_matrix(style_features[layer]) for layer in style_features}

We make a function called gram_matrix that takes a tensor and returns the Gram matrix.

For the style features, we figure out Gram matrices.

Step 6: Perform Style Transfer

We set up the optimzer and loss function and do the style transfer.

# Create a target image and set it as a parameter to optimize
target = content_image.clone().requires_grad_(True).to(device)

# Define the optimizer
optimizer = optim.Adam([target], lr=0.003)

# Define the style transfer function
def style_transfer(model, content_features, style_grams, target, content_weight=1, style_weight=1e6, steps=2000):
    for step in range(steps):
        target_features = get_features(target, model, content_layers + style_layers)
        
        content_loss = torch.mean((target_features[content_layers[0]] - content_features[content_layers[0]])**2)
        
        style_loss = 0
        for layer in style_layers:
            target_feature = target_features[layer]
            target_gram = gram_matrix(target_feature)
            style_gram = style_grams[layer]
            layer_style_loss = torch.mean((target_gram - style_gram)**2)
            style_loss += layer_style_loss / (target_feature.shape[1] * target_feature.shape[2])
        
        total_loss = content_weight * content_loss + style_weight * style_loss
        
        optimizer.zero_grad()
        total_loss.backward()
        optimizer.step()
        
        if step % 500 == 0:
            print(f"Step {step}, Total loss: {total_loss.item()}")
    
    return target

# Perform style transfer
output = style_transfer(vgg, content_features, style_grams, target)

We make a target picture that starts out with the main image.

We set up an enhancer to make the goal picture look better.

We set up the style_transfer tool to move styles while losing as little information and style as possible.

We use the style transfer tool to get the picture that comes out.

Step 7: Display the Result

Lastly, we show the picture that was made.

# Function to unnormalize and display an image
def imshow(tensor, title=None):
    image = tensor.to("cpu").clone().detach()
    image = image.numpy().squeeze()
    image = image.transpose(1, 2, 0)
    image = image * (0.229, 0.224, 0.225) + (0.485, 0.456, 0.406)
    image = image.clip(0, 1)
    plt.imshow(image)
    if title:
        plt.title(title)
    plt.show()

# Display the output image
imshow(output, title="Output Image")

We set up a method called imshow to present the picture without normalizing it.

The imshow tool is used to show the output picture.

logo

Generative AI

Beginner 5 Hours

Image Generation and Style Transfer

Implementing Neural Style Transfer

In neural style transfer, the style of one picture is transferred to another image while the information of the first image is kept. We are going to use PyTorch to build a simple neural style transfer method.

Step 1: Setup and Import Necessary Libraries

To begin, we need to download and add the required files.

# Install necessary libraries !pip install torch torchvision pillow # Import libraries import torch import torch.nn as nn import torch.optim as optim from torchvision import transforms, models from PIL import Image import matplotlib.pyplot as plt

 

PyTorch, torchvision, and Pillow are all installed with pip.

It's necessary to import PyTorch and its neural network, optimizer, and models.

Image preparation uses transforms that we get from torchvision.

For working with picture files, we load pictures from PIL.

For showing pictures, we load matplotlib.

Step 2: Load and Preprocess Images

Images need to be styled and preprocessed before they can be loaded.

# Function to load and preprocess images def load_image(image_path, max_size=400, shape=None): image = Image.open(image_path).convert('RGB') if max(image.size) > max_size: size = max_size else: size = max(image.size) if shape: size = shape in_transform = transforms.Compose([ transforms.Resize(size), transforms.ToTensor(), transforms.Normalize((0.485, 0.456, 0.406), (0.229, 0.224, 0.225))]) image = in_transform(image)[:3, :, :].unsqueeze(0) return image # Load content and style images content_image = load_image('path_to_content_image.jpg') style_image = load_image('path_to_style_image.jpg', shape=content_image.shape[-2:])

We create the function load_image to get pictures and prepare them for use.

The code changes the image's size, turns it into a tensor, and makes it normal.

The load_image method is used to load the main image and the style picture.

Step 3: Define the Model

To get features from the pictures, we will use a VGG19 model that has already been trained.

# Load pre-trained VGG19 model vgg = models.vgg19(pretrained=True).features # Freeze parameters for param in vgg.parameters(): param.requires_grad_(False) # Move the model to GPU if available device = torch.device("cuda" if torch.cuda.is_available() else "cpu") vgg.to(device) # Define the content and style layers content_layers = ['conv4_2'] style_layers = ['conv1_1', 'conv2_1', 'conv3_1', 'conv4_1', 'conv5_1']

We load the VGG19 model that has already been trained and set its settings to zero.

We choose the layers from which to get the style and content features.

If a GPU is available, we move the model there so that it can be computed faster.

Step 4: Extract Features

We need to take features out of the text and change the way pictures look.


# Function to get features def get_features(image, model, layers): features = {} x = image for name, layer in model._modules.items(): x = layer(x) if name in layers: features[name] = x return features # Get content and style features content_features = get_features(content_image.to(device), vgg, content_layers) style_features = get_features(style_image.to(device), vgg, style_layers)



We create a function called get_features that pulls features from the model's levels that we tell it to.

We take traits from the text and style the pictures.

Step 5: Calculate the Style Loss

We need to use Gram matrices to figure out the style loss.

# Function to calculate Gram matrix def gram_matrix(tensor): _, d, h, w = tensor.size() tensor = tensor.view(d, h * w) gram = torch.mm(tensor, tensor.t()) return gram # Get style Gram matrices style_grams = {layer: gram_matrix(style_features[layer]) for layer in style_features}

We make a function called gram_matrix that takes a tensor and returns the Gram matrix.

For the style features, we figure out Gram matrices.

Step 6: Perform Style Transfer

We set up the optimzer and loss function and do the style transfer.

# Create a target image and set it as a parameter to optimize target = content_image.clone().requires_grad_(True).to(device) # Define the optimizer optimizer = optim.Adam([target], lr=0.003) # Define the style transfer function def style_transfer(model, content_features, style_grams, target, content_weight=1, style_weight=1e6, steps=2000): for step in range(steps): target_features = get_features(target, model, content_layers + style_layers) content_loss = torch.mean((target_features[content_layers[0]] - content_features[content_layers[0]])**2) style_loss = 0 for layer in style_layers: target_feature = target_features[layer] target_gram = gram_matrix(target_feature) style_gram = style_grams[layer] layer_style_loss = torch.mean((target_gram - style_gram)**2) style_loss += layer_style_loss / (target_feature.shape[1] * target_feature.shape[2]) total_loss = content_weight * content_loss + style_weight * style_loss optimizer.zero_grad() total_loss.backward() optimizer.step() if step % 500 == 0: print(f"Step {step}, Total loss: {total_loss.item()}") return target # Perform style transfer output = style_transfer(vgg, content_features, style_grams, target)

We make a target picture that starts out with the main image.

We set up an enhancer to make the goal picture look better.

We set up the style_transfer tool to move styles while losing as little information and style as possible.

We use the style transfer tool to get the picture that comes out.

Step 7: Display the Result

Lastly, we show the picture that was made.

# Function to unnormalize and display an image def imshow(tensor, title=None): image = tensor.to("cpu").clone().detach() image = image.numpy().squeeze() image = image.transpose(1, 2, 0) image = image * (0.229, 0.224, 0.225) + (0.485, 0.456, 0.406) image = image.clip(0, 1) plt.imshow(image) if title: plt.title(title) plt.show() # Display the output image imshow(output, title="Output Image")

We set up a method called imshow to present the picture without normalizing it.

The imshow tool is used to show the output picture.

Frequently Asked Questions for Generative AI

Sequence of prompts stored as linked records or documents.

It helps with filtering, categorization, and evaluating generated outputs.



As text fields, often with associated metadata and response outputs.

Combines keyword and vector-based search for improved result relevance.

Yes, for storing structured prompt-response pairs or evaluation data.

Combines database search with generation to improve accuracy and grounding.

Using encryption, anonymization, and role-based access control.

Using tools like DVC or MLflow with database or cloud storage.

Databases optimized to store and search high-dimensional embeddings efficiently.

They enable semantic search and similarity-based retrieval for better context.

They provide organized and labeled datasets for supervised trainining.



Track usage patterns, feedback, and model behavior over time.

Enhancing model responses by referencing external, trustworthy data sources.

They store training data and generated outputs for model development and evaluation.

Removing repeated data to reduce bias and improve model generalization.

Yes, using BLOB fields or linking to external model repositories.

With user IDs, timestamps, and quality scores in relational or NoSQL databases.

Using distributed databases, replication, and sharding.

NoSQL or vector databases like Pinecone, Weaviate, or Elasticsearch.

With indexing, metadata tagging, and structured formats for efficient access.

Text, images, audio, and structured data from diverse databases.

Yes, for representing relationships between entities in generated content.

Yes, using structured or document databases with timestamps and session data.

They store synthetic data alongside real data with clear metadata separation.



line

Copyrights © 2024 letsupdateskills All rights reserved