10 Best Alternatives To OpenAI Triton

Last month, OpenAI

released

Triton 1.0, an open-source Python-like programming language that enables researchers to write highly efficient graphics processing unit (GPU) code. OpenAI claims

Triton

delivers substantial ease-of-use benefits over coding in

CUDA

, a programming tool developed by NVIDIA. The development repository for the Triton language and compiler is available on

GitHub

OpenAI scientist Philippe Tillet

said

the aim is to become a viable alternative to CUDA for deep learning. “It is for machine learning researchers and engineers who are unfamiliar with GPU programming despite having good software engineering skills,” he added.

Today, several high-level programming languages and libraries offer access to the GPU for certain sets of problems and algorithms. In this article, we look at the alternatives to

OpenAI Triton

OpenACC

OpenACC is a user-driven directive-based ‘performance-portable’ parallel programming model. It is designed for engineers and scientists interested in porting their codes to heterogeneous ‘HPC’ hardware platforms and architectures with significantly less programming effort than required with a low-level model. It supports C, C++, Fortran programming languages and multiple hardware architectures, including X86 & POWER CPUs and NVIDIA GPUs.

While OpenACC offers a set of directives to execute code in

parallel on the GPU

, such high-level abstractions are only efficient for certain classes of problems and often unsuitable for nontrivial parallelisation or data movement.

CUDA

Developed by NVIDIA for general computing,

CUDA

stands for Compute Unified Device Architecture. This software layer gives direct access to the GPUs virtual instruction set and parallel computational elements for the execution of compute kernels.

It is one of the leading proprietary frameworks for general-purpose computing on GPUs (GPGPU) from NVIDIA. GPGPU refers to the use of GPUs to assist in performing tasks handled by CPUs. It allows information to flow in both directions — CPU to GPU and vice versa, improving efficiency in various tasks, especially images and videos.

CUDA can work with programming languages like C, C++, and Fortran. It has applications in various fields, including life sciences, bioinformatics, computer vision, electrodynamics, computational chemistry, medical imaging, finance, etc.

PyCUDA

gives Pythonic access to NVIDIA’s CUDA parallel computation API. It helps in object cleanup tied to the lifetime of the object. PyCUDA knows about dependencies, too, so it won’t detach from a context before all memory allocation in it is also freed. Abstractions like

SourceModule

and

GPUArray

make CUDA programming even more convenient than with NVIDIA’s C-based runtime.

PyCUDA ensures all CUDA errors are automatically translated into Python exceptions.

OpenCL

Open computing language (

OpenCL

) is an open standard for writing code that runs across heterogeneous platforms, including CPUs, GPUs, digital signal processors (DSPs), field-programmable gate arrays (FPGAs) and other processors or hardware accelerators. Notably, it provides applications with access to GPUs for GPGPU that in some cases results in significant speed-up. For example, in computer vision, many algorithms can run on a GPU much more efficiently than on a CPU, particularly in image processing, computational photography, object detection, matrix arithmetic, etc.

OpenPAI

Developed by Microsoft, OpenPAI offers complete ‘AI model’ training and resource management capabilities. The open-source platform supports on-premise, cloud, and hybrid environments. Check out more details about OpenPAI

here

CatBoost

Developed by Yandex researchers and engineers,

CatBoost

is an algorithm for

gradient boosting

on decision trees. It is used for search, recommendation systems, personal assistant, weather prediction, self-driving cars, etc. Also, it supports computation on CPU and GPU.

CatBoost has superior quality

compared

to GBDT libraries on many datasets; has best in class

prediction speed

; supports both

numerical and categorical features

; and fast GPU and multi-GPU support for training out of the box, and includes

visualisation tools