Simplify Deep Learning with Trainer API and Hugging Face

trainer api by hugging face

Machine learning can seem intimidating, especially when you’re just starting out. Between managing datasets, defining models, handling training loops, and evaluating performance, there’s a lot to keep track of. But what if I told you there’s a way to simplify this process? Enter Hugging Face’s Trainer API—a powerful tool that abstracts away much of the boilerplate code, making it easier to focus on the core of your machine learning tasks.

In this blog, we’ll explore how to use the Trainer API by comparing it with a traditional PyTorch implementation. We’ll use a simple linear regression example to demonstrate how Hugging Face can make your life easier. This is structured in the side by side manner so that the comparison can be made efficiently and each step is clearly visible.


The Task: Linear Regression

We’ll implement a basic linear regression model to predict a continuous value y based on a single feature x. The relationship is defined as y = 2x + 1 (with some added noise). We’ll compare the PyTorch implementation with Hugging Face’s Trainer API step by step. This will help us understand the benefits of using the Trainer API.

Step 1: Importing Libraries

Python
# =============================================
# Approach 1: Traditional PyTorch Implementation
# =============================================

import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, TensorDataset

# =============================================
# Approach 2: HuggingFace Implementation
# =============================================

import torch
from torch.utils.data import Dataset
from transformers import Trainer, TrainingArguments

Step 2: Create Synthetic Data

Now we can create some synthetic data for fitting Linear regression model.

Python
# =============================================
# Approach 1: Traditional PyTorch Implementation
# =============================================

# Generate synthetic data
torch.manual_seed(42)
x = torch.linspace(0, 10, 100).reshape(-1, 1)
y = 2 * x + 1 + torch.randn(x.shape) * 2  # y = 2x + 1 + noise

# Create dataset and dataloader
dataset = TensorDataset(x, y)
dataloader = DataLoader(dataset, batch_size=10, shuffle=True)

# =============================================
# Approach 2: HuggingFace Implementation
# =============================================

# Create a custom dataset class
class SyntheticDataset(Dataset):
    def __init__(self, x, y):
        self.x = x
        self.y = y

    def __len__(self):
        return len(self.x)

    def __getitem__(self, idx):
        return {"input_ids": self.x[idx], "labels": self.y[idx]}

# Generate synthetic data
x = torch.linspace(0, 10, 100).reshape(-1, 1)
y = 2 * x + 1 + torch.randn(x.shape) * 2  # y = 2x + 1 + noise

dataset = SyntheticDataset(x, y)

Here for generating synthetic data the process is almost same but Trainer API expects “input_ids” as the key due to which it was structured in that way.

Step 3: Define the Model

The model definition would be exactly similar and its defined like below.

Python
# =============================================
# Approach 1: Traditional PyTorch Implementation
# =============================================

class LinearRegressionModel(nn.Module):
    def __init__(self):
        super(LinearRegressionModel, self).__init__()
        self.linear = nn.Linear(1, 1)  # 1 input feature, 1 output

    def forward(self, x):
        return self.linear(x)

model = LinearRegressionModel()

# =============================================
# Approach 2: HuggingFace Implementation
# =============================================

class LinearRegressionModel(torch.nn.Module):
    def __init__(self):
        super(LinearRegressionModel, self).__init__()
        self.linear = torch.nn.Linear(1, 1)
        self.criterion = nn.MSELoss()  # Define loss function inside the model

    def forward(self, x):
        outputs = self.linear(x["input_ids"].float())
        loss = None
        if labels is not None:  # Compute loss only during training
            loss = self.criterion(outputs, labels)
        return {"loss": loss, "logits": outputs} if loss is not None else outputs

model = LinearRegressionModel()

Step 4: Define Loss and Optimizer

Python
# =============================================
# Approach 1: Traditional PyTorch Implementation
# =============================================

criterion = nn.MSELoss()
optimizer = optim.SGD(model.parameters(), lr=0.01)

# =============================================
# Approach 2: HuggingFace Implementation
# =============================================

# The Loss function is defined in the model creation part and optimizer is handled in the Trainer API internally.

Step 5: Training Loop

Python
# =============================================
# Approach 1: Traditional PyTorch Implementation
# =============================================

num_epochs = 100
for epoch in range(num_epochs):
    for batch_x, batch_y in dataloader:
        # Forward pass
        outputs = model(batch_x)
        loss = criterion(outputs, batch_y)

        # Backward pass and optimization
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

    if (epoch + 1) % 10 == 0:
        print(f"Epoch [{epoch+1}/{num_epochs}], Loss: {loss.item():.4f}")

# =============================================
# Approach 2: HuggingFace Implementation
# =============================================

# Define training arguments
training_args = TrainingArguments(
    output_dir="./results",
    num_train_epochs=100,
    per_device_train_batch_size=10,
    logging_dir="./logs",
    logging_steps=10,
)

# Define the Trainer
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=dataset,
)

# Train the model
trainer.train()

Step 6: Evaluate the Model

Python
# =============================================
# Approach 1: Traditional PyTorch Implementation
# =============================================

with torch.no_grad():
    predicted = model(x)
    print("Predicted values:", predicted.flatten())
    
# =============================================
# Approach 2: HuggingFace Implementation
# =============================================

with torch.no_grad():
    predicted = model({"input_ids": x})
    print("Predicted values:", predicted.flatten())

Full Code and Implementation

Below code snippet provides a conceptual side-by-side comparison of training a model using a standard, end-to-end PyTorch approach versus leveraging the Hugging Face Trainer API. The traditional PyTorch code exemplifies the manual process: explicitly managing the training loop, zeroing gradients, performing forward and backward passes, and updating model parameters.

In contrast, the Hugging Face Trainer abstracts away these low-level details. By defining TrainingArguments and instantiating a Trainer object with the model, dataset, and training configuration, the trainer.train() call handles the entire training process, including optimizations, logging, and optional metric computations, significantly reducing boilerplate code and simplifying the training workflow.

Python Script Comparison
# =============================================
# Approach 1: Traditional PyTorch Implementation
# Steps : 
# 1. Define a simple linear regression model as a PyTorch module
# 2. Set up the loss function (Mean Squared Error) and optimizer (Stochastic Gradient Descent)
# 3. Implement a training loop that runs for 100 epochs:
# 	- Compute predictions
# 	- Calculate loss
# 	- Backpropagate gradients
# 	- Update model parameters
# 4. Evaluate the model on the test set
# =============================================

# Import Libraries
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, TensorDataset

# Generate synthetic data
torch.manual_seed(42)
x = torch.linspace(0, 10, 100).reshape(-1, 1)
y = 2 * x + 1 + torch.randn(x.shape) * 2  # y = 2x + 1 + noise

# Create dataset and dataloader
dataset = TensorDataset(x, y)
dataloader = DataLoader(dataset, batch_size=10, shuffle=True)

# Define the model 
class LinearRegressionModel(nn.Module):
    def __init__(self):
        super(LinearRegressionModel, self).__init__()
        self.linear = nn.Linear(1, 1)  # 1 input feature, 1 output

    def forward(self, x):
        return self.linear(x)

# Define loss and optimizer
criterion = nn.MSELoss()
optimizer = optim.SGD(model.parameters(), lr=0.01)

# Training Loop in PyTorch
num_epochs = 100
for epoch in range(num_epochs):
    for batch_x, batch_y in dataloader:
        # Forward pass
        outputs = model(batch_x)
        loss = criterion(outputs, batch_y)

        # Backward pass and optimization
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

    if (epoch + 1) % 10 == 0:
        print(f"Epoch [{epoch+1}/{num_epochs}], Loss: {loss.item():.4f}")

# Model Evaluation
with torch.no_grad():
    predicted = model(x)
    print("Predicted values:", predicted.flatten())

# =============================================
# Approach 2: Hugging Face Trainer API Implementation
# Steps:
# 1. Define a custom model compatible with the Trainer API
# 	- Note how the forward method is structured to return dictionaries with "loss" and "logits" keys
# 2. Create a custom dataset class that provides data in the format expected by the Trainer
# 3. Define a function to compute metrics (MSE in this case)
# 4. Set up TrainingArguments to configure the training process
# 5. Initialize the Trainer with our model, arguments, and datasets
# 6. Call trainer.train() to start training
# 7. Evaluate the model 
# =============================================

# Import libraies 
import torch
from torch.utils.data import Dataset
from transformers import Trainer, TrainingArguments

# Create a custom dataset class
class SyntheticDataset(Dataset):
    def __init__(self, x, y):
        self.x = x
        self.y = y

    def __len__(self):
        return len(self.x)

    def __getitem__(self, idx):
        return {"input_ids": self.x[idx], "labels": self.y[idx]}

# Generate synthetic data
x = torch.linspace(0, 10, 100).reshape(-1, 1)
y = 2 * x + 1 + torch.randn(x.shape) * 2  # y = 2x + 1 + noise

dataset = SyntheticDataset(x, y)

# Define a custom model compatible with the Trainer API
class LinearRegressionHF(nn.Module):
    def __init__(self, input_dim):
        super(LinearRegressionHF, self).__init__()
        self.linear = nn.Linear(input_dim, 1)
    
    def forward(self, input_ids=None, labels=None, **kwargs):
        # The Trainer API expects specific parameter names
        outputs = self.linear(input_ids)
        
        # Calculate loss if labels are provided
        loss = None
        if labels is not None:
            loss_fct = nn.MSELoss()
            loss = loss_fct(outputs, labels)
        
        # Return loss and outputs in a dictionary format
        return {"loss": loss, "logits": outputs} if loss is not None else {"logits": outputs}

model = LinearRegressionHF()

# Define training arguments
training_args = TrainingArguments(
    output_dir="./results",
    num_train_epochs=100,
    per_device_train_batch_size=10,
    logging_dir="./logs",
    logging_steps=10,
)

# Define the Trainer
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=dataset,
)

# Train the model
trainer.train()

# Model Evaluation
with torch.no_grad():
    predicted = model({"input_ids": x})
    print("Predicted values:", predicted.flatten())

Copied to clipboard!

Key Benefits of the Trainer API

Looking at our examples, several benefits of the Trainer API become clear:

  1. Code Reduction: The Trainer API implementation eliminates the need to write the training loop manually. We don’t need to worry about forward passes, backward passes, or gradient updates.
  2. Built-in Features: The Trainer API automatically provides:
    • Checkpointing (saving and loading models)
    • Logging training progress
    • Early stopping
    • Training on multiple GPUs
  3. Configuration Over Code: Instead of writing code to control training behavior, we just configure the TrainingArguments object.
  4. Consistent Interface: The same interface works for linear regression, neural networks, or transformer models, making it easier to experiment with different architectures.

Conclusion

The Hugging Face Trainer API significantly simplifies the process of training machine learning models by abstracting away many of the repetitive and complex aspects of the training loop. As demonstrated with our linear regression example, it allows beginners to focus on understanding the core concepts without getting bogged down in implementation details.

For beginners, this means you can start with simpler models like linear regression and gradually move to more complex models like transformers without having to learn a new training paradigm each time. The Trainer API provides a consistent interface that grows with you as you develop your machine learning skills.

Whether you’re just starting out in machine learning or looking to streamline your workflow, the Hugging Face Trainer API is a powerful tool that can help you get results faster and with less code.

References

  1. https://huggingface.co/docs/transformers/en/main_classes/trainer
  2. https://github.com/huggingface/transformers/blob/main/src/transformers/trainer.py
  3. Fine Tuning a pretrained modelhttps://huggingface.co/docs/transformers/en/training

Share the Post:

Leave a Comment

Your email address will not be published. Required fields are marked *

Related Posts

Join my Newsletter

Scroll to Top