How to use GPUs with PyTorch (2024)

The Role of GPUs in Deep Learning

GPUs, or Graphics Processing Units, are important pieces of hardware originally designed for rendering computer graphics, primarily for games and movies. However, in recent years, GPUs have gained recognition for significantly enhancing the speed of computational processes involving neural networks.

GPUs now play a pivotal role in the artificial intelligence revolution, predominantly driving rapid advancements in deep learning, computer vision, and large language models, among others.

In this article, we will delve into the utilization of GPUs to expedite neural network training using PyTorch, one of the most widely used deep learning libraries.

How to use GPUs with PyTorch (1)

Note: An NVIDIA GPU-equipped machine is required to follow the instructions in this article.

Inroduction to GPUs with PyTorch

PyTorch is an open-source, simple, and powerful machine-learning framework based on Python. It is used to develop and train neural networks by performing tensor computations like automatic differentiation using the Graphics Processing Units.

PyTorch employs the CUDA library to configure and leverage NVIDIA GPUs. CUDA is a GPU computing toolkit developed by Nvidia, designed to expedite compute-intensive operations by parallelizing them across multiple GPUs. PyTorch offers support for CUDA through the torch.cuda library.

Utilising GPUs in Torch via the CUDA Package

The CUDA library in PyTorch is instrumental in detecting, activating, and harnessing the power of GPUs. Let's delve into some functionalities using PyTorch.

Verifying GPU Availability

Before using the GPUs, we can check if they are configured and ready to use. The following code returns a boolean indicating whether GPU is configured and available for use on the machine.

import torchprint(torch.cuda.is_available())

The number of GPUs present on the machine and the device in use can be identified as follows:


This output indicates that there is a single GPU available, and it is identified by the device number 0.

Initialize the Device

The active device can be initialized and stored in a variable for future use, such as loading models and tensors into it. This step is necessary if GPUs are available because CPUs are automatically detected and configured by PyTorch.

The torch.device function can be used to select the device.

>>> device = torch.device("cuda" if torch.cuda.is_available() else "cpu")>>> device

With the device variable, we can now create and move tensors into it.

Creating and Moving tensors to the GPU

The models and datasets are represented as PyTorch tensors, which must be initialized on, or transferred to, the GPU prior to training the model. This can be accomplished in several ways, as outlined below:

  1. Creating Tensors Directly on the GPU

Tensors can be directly created on the desired device, such as the GPU, by specifying the device parameter. By default, tensors are created on the CPU. You can determine the device where the tensor is stored by accessing the device parameter of the tensor.

x = torch.tensor([1, 2, 3])print(x)print("Device: ", x.device)
tensor([1, 2, 3]) Device: cpu

Now, let's generate the tensors directly on the device.

y = torch.tensor([4, 5, 6], device=device)print(y)print("Device: ", y.device)
tensor([4, 5, 6], device='cuda:0') Device: cuda:0

Lastly, the device number where the tensors are stored can be retrieved using the get_device() method.


In the output above, -1 represents the CPU, while 0 represents GPU number 0.

  1. Transferring Tensors Using the to() Method

Tensors can be transferred from the CPU to the device using the to() method, which is supported by PyTorch tensors.

Free eBook: Git Essentials

Check out our hands-on, practical guide to learning Git, with best-practices, industry-accepted standards, and included cheat sheet. Stop Googling Git commands and actually learn it!

x = torch.tensor([1, 2, 3])x ="Device: ", x.device)print(x.get_device())
Device: cuda:00

When multiple GPUs are available, tensors can be transferred to specific GPUs by passing the device number as a parameter.

For instance, cuda:0 is for the first GPU, cuda:1 for the second GPU, and so on.

# Transfer to the first GPUx = torch.tensor([8, 9, 10])x ="cuda:0")print(x.device)

Attempting to transfer to a GPU that is not available or to an incorrect GPU number will result in a CUDA error.

  1. Transferring Tensors Using the cuda() Method

Below is an example of creating a sample tensor and transferring it to the GPU using the cuda() method, which is supported by PyTorch tensors.

# Create a random tensor of shape (100, 30)tensor = torch.rand((100, 30)) tensor = tensor.cuda()print(tensor.device)
device(type='cuda', index=0)

Now let's explore techniques to load the tensors into multiple GPUs through parallelisation i.e. one of the most important features responsible for high computational speeds in GPUs.

Multi-GPU Distributed Training

Distributed training involves deploying both the model and the dataset across multiple GPUs, thereby dramatically accelerating the training process via the capability of parallelization. We will cover some of the distributed training classes offered by PyTorch in the following sections.

How to use GPUs with PyTorch (2)
Source: NVIDIA


DataParallel is an effective way for conducting multi-GPU training of models on a single machine. It achieves data parallelization at the module level by dividing the input across the designated devices via chunking, and then propagating it through the model by replicating the inputs on all devices.

Let's create and initialise a basic LinearRegression model class prior to wrapping it within the DataParallel class.

import torch.nn as nnclass LinearRegression(nn.Module): def __init__(self, input_size, output_size): super(LinearRegression, self).__init__() self.linear = nn.Linear(input_size, output_size) def forward(self, x): return self.linear(x)# Initialize the modelmodel = LinearRegression(2, 5)print(model)
LinearRegression( (linear): Linear(in_features=2, out_features=5, bias=True) )

Now, let's wrap the model to execute data parallelization across multiple GPUs. This can be achieved by utilizing the nn.DataParallel class and passing the model along with the device list as parameters.

model = nn.DataParallel(model, device_ids=[0])print(model)
DataParallel( (module): LinearRegression( (linear): Linear(in_features=2, out_features=5, bias=True) ) )

In the above code, we have passed the model along with the list of device ids as parameters. Now we can proceed by directly loading the model on to device and perform model training as required.

# Move the model and inputs to GPUmodel = input_data = Continue with Training Loop# ...

DistributedDataParallel (DDP)

The DistributedDataParallel class from PyTorch supports training across multiple GPU training on multiple machines. The DistributedDataParallel class is recommended over the DataParallel class, as it manages single machine scenarios by default and exhibits superior speed compared to the DataParallel wrapper.

The DistributedDataParallel module operates on the principle of data parallelism. Here, the model and data are duplicated across multiple processes, and each process conducts training on a data subset.

Setting up the DistributedDataParallel class entails initializing the distributed environment and subsequently wrapping the model with the DDP object.

# Importsimport torchimport torch.nn as nn
# Define the Linear Regression modelclass LinearRegression(nn.Module): def __init__(self, input_size, output_size): super(LinearRegression, self).__init__() self.linear = nn.Linear(input_size, output_size) def forward(self, x): return self.linear(x)# Initialize the modelmodel = LinearRegression(2, 5)
# Initialize the distributed environment.torch.distributed.init_process_group(backend='nccl')# Wrap the model with DDPmodel = nn.parallel.DistributedDataParallelCPU(model)# Proceed to load and train the model# ...

This provides a basic wrapper to load the model for multi-GPU training across multiple nodes.


In this article, we've explored various methods to leverage NVIDIA GPUs using the CUDA library in the PyTorch ML library. These strategies help us harness the power of robust GPUs, accelerating the model training process by a factor of ten compared to traditional CPUs in deep learning applications. This significant reduction in training time expedites a broad array of compute-intensive tasks.

How to use GPUs with PyTorch (2024)


How to use your GPU in PyTorch? ›

After configuring GPU in PyTorch, you can easily move your data and models to GPU using the to('cuda') method. After moving a tensor to the GPU, the operations can be carried out just like they would with CPU tensors. PyTorch automatically utilizes the GPU for these operations, leading to quicker computation times.

How do I make PyTorch use all GPUs? ›

Data Parallelism
  1. Step 1: Import PyTorch and Define the Model. ...
  2. Step 2: Next, initialize the model and define the loss function and optimizer. ...
  3. Step 3: Create the data loader and move the model to GPUs. ...
  4. Step 4: Train the model. ...
  5. Step 1: Import PyTorch and define the model.
Dec 13, 2023

Does PyTorch support my GPU? ›

Depending on your system and compute requirements, your experience with PyTorch on Windows may vary in terms of processing time. It is recommended, but not required, that your Windows system has an NVIDIA GPU in order to harness the full power of PyTorch's CUDA support.

Does PyTorch use GPU by default? ›

The default device is initially cpu .

How do I know if my GPU is working with PyTorch? ›

Checking if PyTorch is Using the GPU

cuda. is_available() function. If a GPU is available, it sets the device variable to "cuda" , indicating that we want to use the GPU. If a GPU is not available, it sets device to "cpu" , indicating that we want to use the CPU.

Is CUDA necessary for PyTorch? ›

Yes, this is correct. Your locally CUDA toolkit will be used if you build PyTorch from source or a custom CUDA extension. You won''t need it to execute PyTorch workloads as the binaries (pip wheels and conda binaries) install all needed requirements.

Is PyTorch better than TensorFlow? ›

PyTorch is ideal for research and small-scale projects prioritizing flexibility, experimentation and quick editing capabilities for models. TensorFlow is ideal for large-scale projects and production environments that require high-performance and scalable models.

Does PyTorch run on CPU or GPU? ›

WML CE includes GPU-enabled and CPU-only variants of PyTorch, and some companion packages. The GPU-enabled variant pulls in CUDA and other NVIDIA components during install. It has larger installation size and includes support for advanced features that require GPU, such as DDL, LMS, and NVIDIA's Apex.

How to use GPU for deep learning? ›

How to Setup GPU for Deep Learning (Windows 11)
  1. Step 1: Install Anaconda for building an Environment. ...
  2. Step 2: Installing GPU Drivers. ...
  3. Step 3: Install CUDA. ...
  4. Step 4: Downloading cuDNN and Setup the Path variables. ...
  5. Step 5: Download Visual Studio Community 2019 (Optional) ...
  6. Step 6: Setting Up the Environment for your project.
Jun 15, 2023

Is PyTorch GPU accelerated? ›

PyTorch is a popular open-source deep learning framework that allows developers to build and train neural networks. One of the key advantages of PyTorch is its ability to leverage the power of GPUs to accelerate computations, making it an excellent choice for training deep neural networks.

How to send data to GPU PyTorch? ›

Steps to Load PyTorch DataLoader into GPU
  1. Step 1: Define the Dataset and DataLoader. The first step is to define the dataset and DataLoader. ...
  2. Step 2: Define the Model. The next step is to define the model. ...
  3. Step 3: Define the Loss Function and Optimizer. ...
  4. Step 4: Load Data into GPU.
Jun 13, 2023

How do I use my GPU on a virtual machine? ›

  1. Run the virtual machine and connect to it using a serial console, such as SPICE or VNC.
  2. Download the driver to the virtual machine. ...
  3. Install the GPU driver. ...
  4. After the driver finishes installing, reboot the machine. ...
  5. Connect a monitor to the host GPU output interface and run the virtual machine.

How do I use my GPU? ›

Software Settings: Open the software or application you want to use and navigate to the settings or preferences menu. Look for options related to hardware acceleration or GPU usage. Enable the GPU acceleration option if available.

How do I enable CUDA for PyTorch? ›

Installing PyTorch with Cuda
  1. Check your NVIDIA driver. Open the NVIDIA Control Panel. ...
  2. Open a command prompt. Open a Windows terminal or the command prompt (cmd) and type python. ...
  3. Install pytorch with cuda. ...
  4. Test if cuda is recognized.

How do I enable GPU in Colab PyTorch? ›

NB If you're running this notebook on Colab, to enable the GPU support go to 'Edit'->'Notebook settings' and set 'Hardware accelerator' to 'GPU' . PyTrorch and TensorFlow are two of the most commonly used deep learning frameworks.

Top Articles
Latest Posts
Article information

Author: Patricia Veum II

Last Updated:

Views: 5567

Rating: 4.3 / 5 (44 voted)

Reviews: 91% of readers found this page helpful

Author information

Name: Patricia Veum II

Birthday: 1994-12-16

Address: 2064 Little Summit, Goldieton, MS 97651-0862

Phone: +6873952696715

Job: Principal Officer

Hobby: Rafting, Cabaret, Candle making, Jigsaw puzzles, Inline skating, Magic, Graffiti

Introduction: My name is Patricia Veum II, I am a vast, combative, smiling, famous, inexpensive, zealous, sparkling person who loves writing and wants to share my knowledge and understanding with you.