Guides

Tl;dr ONNX Runtime is your cross-platform, hardware-accelerated inference engine to deploy fast and private AI anywhere. What even is ONNX? ONNX Runtime (pronounced like “Onix”, the rock-snake Pokémon) is a performance-focused scoring and inference engine for models in the Open Neural Network Exchange (ONNX) format, designed to support heavy workloads and deliver fast, reliable predictions in production scenarios. Developed by Microsoft, it provides APIs in Python, C++, C#, Java, and JavaScript, and runs on Windows, Linux, and macOS. Its core is implemented in C++ for speed, and it seamlessly interoperates with ONNX-exported models from PyTorch, TensorFlow, Keras, scikit-learn, LightGBM, XGBoost, and more. ...

Guides

A "quantized" guide to quantization in LLMs

ONNX Runtime: Unleash On-Device AI