Simplify Deep Learning with Trainer API and Hugging Face

Contents

trainer api by hugging face

Machine learning can seem intimidating, especially when you’re just starting out. Between managing datasets, defining models, handling training loops, and evaluating performance, there’s a lot to keep track of. But what if I told you there’s a way to simplify this process? Enter Hugging Face’s Trainer API—a powerful tool that abstracts away much of the boilerplate code, making it easier to focus on the core of your machine learning tasks.

In this blog, we’ll explore how to use the Trainer API by comparing it with a traditional PyTorch implementation. We’ll use a simple linear regression example to demonstrate how Hugging Face can make your life easier. This is structured in the side by side manner so that the comparison can be made efficiently and each step is clearly visible.


The Task: Linear Regression

We’ll implement a basic linear regression model to predict a continuous value y based on a single feature x. The relationship is defined as y = 2x + 1 (with some added noise). We’ll compare the PyTorch implementation with Hugging Face’s Trainer API step by step. This will help us understand the benefits of using the Trainer API.

Step 1: Importing Libraries

Python
# =============================================
# Approach 1: Traditional PyTorch Implementation
# =============================================

import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, TensorDataset

# =============================================
# Approach 2: HuggingFace Implementation
# =============================================

import torch
from torch.utils.data import Dataset
from transformers import Trainer, TrainingArguments

Step 2: Create Synthetic Data

Now we can create some synthetic data for fitting Linear regression model.

Python
# =============================================
# Approach 1: Traditional PyTorch Implementation
# =============================================

# Generate synthetic data
torch.manual_seed(42)
x = torch.linspace(0, 10, 100).reshape(-1, 1)
y = 2 * x + 1 + torch.randn(x.shape) * 2  # y = 2x + 1 + noise

# Create dataset and dataloader
dataset = TensorDataset(x, y)
dataloader = DataLoader(dataset, batch_size=10, shuffle=True)

# =============================================
# Approach 2: HuggingFace Implementation
# =============================================

# Create a custom dataset class
class SyntheticDataset(Dataset):
    def __init__(self, x, y):
        self.x = x
        self.y = y

    def __len__(self):
        return len(self.x)

    def __getitem__(self, idx):
        return {"input_ids": self.x[idx], "labels": self.y[idx]}

# Generate synthetic data
x = torch.linspace(0, 10, 100).reshape(-1, 1)
y = 2 * x + 1 + torch.randn(x.shape) * 2  # y = 2x + 1 + noise

dataset = SyntheticDataset(x, y)

Here for generating synthetic data the process is almost same but Trainer API expects “input_ids” as the key due to which it was structured in that way.

Step 3: Define the Model

The model definition would be exactly similar and its defined like below.

Python
# =============================================
# Approach 1: Traditional PyTorch Implementation
# =============================================

class LinearRegressionModel(nn.Module):
    def __init__(self):
        super(LinearRegressionModel, self).__init__()
        self.linear = nn.Linear(1, 1)  # 1 input feature, 1 output

    def forward(self, x):
        return self.linear(x)

model = LinearRegressionModel()

# =============================================
# Approach 2: HuggingFace Implementation
# =============================================

class LinearRegressionModel(torch.nn.Module):
    def __init__(self):
        super(LinearRegressionModel, self).__init__()
        self.linear = torch.nn.Linear(1, 1)
        self.criterion = nn.MSELoss()  # Define loss function inside the model

    def forward(self, x):
        outputs = self.linear(x["input_ids"].float())
        loss = None
        if labels is not None:  # Compute loss only during training
            loss = self.criterion(outputs, labels)
        return {"loss": loss, "logits": outputs} if loss is not None else outputs

model = LinearRegressionModel()

Step 4: Define Loss and Optimiser

Python
# =============================================
# Approach 1: Traditional PyTorch Implementation
# =============================================

criterion = nn.MSELoss()
optimizer = optim.SGD(model.parameters(), lr=0.01)

# =============================================
# Approach 2: HuggingFace Implementation
# =============================================

# The Loss function is defined in the model creation part and optimizer is handled in the Trainer API internally.

Step 5: Training Loop

Python
# =============================================
# Approach 1: Traditional PyTorch Implementation
# =============================================

num_epochs = 100
for epoch in range(num_epochs):
    for batch_x, batch_y in dataloader:
        # Forward pass
        outputs = model(batch_x)
        loss = criterion(outputs, batch_y)

        # Backward pass and optimization
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

    if (epoch + 1) % 10 == 0:
        print(f"Epoch [{epoch+1}/{num_epochs}], Loss: {loss.item():.4f}")

# =============================================
# Approach 2: HuggingFace Implementation
# =============================================

# Define training arguments
training_args = TrainingArguments(
    output_dir="./results",
    num_train_epochs=100,
    per_device_train_batch_size=10,
    logging_dir="./logs",
    logging_steps=10,
)

# Define the Trainer
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=dataset,
)

# Train the model
trainer.train()

Step 6: Evaluate the Model

Python
# =============================================
# Approach 1: Traditional PyTorch Implementation
# =============================================

with torch.no_grad():
    predicted = model(x)
    print("Predicted values:", predicted.flatten())
    
# =============================================
# Approach 2: HuggingFace Implementation
# =============================================

with torch.no_grad():
    predicted = model({"input_ids": x})
    print("Predicted values:", predicted.flatten())

Full Code and Implementation

Below code snippet provides a conceptual side-by-side comparison of training a model using a standard, end-to-end PyTorch approach versus leveraging the Hugging Face Trainer API. The traditional PyTorch code exemplifies the manual process: explicitly managing the training loop, zeroing gradients, performing forward and backward passes, and updating model parameters.

In contrast, the Hugging Face Trainer abstracts away these low-level details. By defining TrainingArguments and instantiating a Trainer object with the model, dataset, and training configuration, the trainer.train() call handles the entire training process, including optimizations, logging, and optional metric computations, significantly reducing boilerplate code and simplifying the training workflow.

Key Benefits of the Trainer API

Looking at our examples, several benefits of the Trainer API become clear:

  1. Code Reduction: The Trainer API implementation eliminates the need to write the training loop manually. We don’t need to worry about forward passes, backward passes, or gradient updates.
  2. Built-in Features: The Trainer API automatically provides:
    • Checkpointing (saving and loading models)
    • Logging training progress
    • Early stopping
    • Training on multiple GPUs
  3. Configuration Over Code: Instead of writing code to control training behavior, we just configure the TrainingArguments object.
  4. Consistent Interface: The same interface works for linear regression, neural networks, or transformer models, making it easier to experiment with different architectures.

Conclusion

The Hugging Face Trainer API significantly simplifies the process of training machine learning models by abstracting away many of the repetitive and complex aspects of the training loop. As demonstrated with our linear regression example, it allows beginners to focus on understanding the core concepts without getting bogged down in implementation details.

For beginners, this means you can start with simpler models like linear regression and gradually move to more complex models like transformers without having to learn a new training paradigm each time. The Trainer API provides a consistent interface that grows with you as you develop your machine learning skills.

Whether you’re just starting out in machine learning or looking to streamline your workflow, the Hugging Face Trainer API is a powerful tool that can help you get results faster and with less code.

References

  1. https://huggingface.co/docs/transformers/en/main_classes/trainer
  2. https://github.com/huggingface/transformers/blob/main/src/transformers/trainer.py
  3. Fine Tuning a pretrained modelhttps://huggingface.co/docs/transformers/en/training

Leave a Comment

Your email address will not be published. Required fields are marked *

2 thoughts on “Simplify Deep Learning with Trainer API and Hugging Face”

Related Posts

Join my Newsletter

Scroll to Top