Explore

GPU Inference & Serving

High-throughput inference servers and local runtime stacks tuned for GPUs.

Projects

GPU Inference & Serving listings

Text Generation Inference

Recommended install with Docker

New

Large Language Model Text Generation Inference

nlpbloominferencefalconPyTorchpy
User submitted (not verified)

Triton Inference Server

Ubuntu 22.04 (x86-64), Jetpack 4.x Nvidia Jetson (Xavier, Nano, TX2), Windows 10 (x86-64), Docker recommended

New

The Triton Inference Server provides an optimized cloud and edge inferencing solution.

Nvidiainference
User submitted (not verified)

vLLM

GPU-NVIDIA CUDA, AMD ROCm, Intel XPU. GPU-Intel/AMD x86, ARM AArch64, Apple Silicon, IMB Z (S390X)

New

A high-throughput and memory-efficient inference and serving engine for LLMs

Inferenceliblocal aiOpenAIhugging face
User submitted (not verified)