GPU Inference & Serving

Projects

GPU Inference & Serving listings

Recommended install with Docker

New

Large Language Model Text Generation Inference

User submitted (not verified)

GitHub → Demo →

Ubuntu 22.04 (x86-64), Jetpack 4.x Nvidia Jetson (Xavier, Nano, TX2), Windows 10 (x86-64), Docker recommended

New

The Triton Inference Server provides an optimized cloud and edge inferencing solution.

User submitted (not verified)

GitHub → Demo →

GPU-NVIDIA CUDA, AMD ROCm, Intel XPU. GPU-Intel/AMD x86, ARM AArch64, Apple Silicon, IMB Z (S390X)

New

A high-throughput and memory-efficient inference and serving engine for LLMs

User submitted (not verified)

GitHub → Demo →