You can filter my posts by their tags below:
Filter by tags:
A "quantized" guide to quantization in LLMs
Tl;dr Quantization helps you shrink and speed up LLMs without sacrificing too much performance. From 8-bit variants, down all the way to even 1-bit, the flavors are wild and necessary if you want to run powerful models on weak hardware. What even is quantization? Fig: Example of quantization from FP32 to INT8 When I was reading up a paper on Google’s TPU for a grad course, I came across their explanation on what ‘quantization’ is which has stuck with me till today. ...
Ecosystems
Stay within your own ecosystem. The ecosystem can get disturbed once new thoughts or new ‘unnecessary’ practices, tools or methodologies seep in to your daily system of doing things. Figure out the best components to your ecosystem, on your own. Plenty of shiny new youtube videos, tweets and posts will show you which tools, gadgets, items you need to have in order to win, which actually just works for them. Their quirky methodologies, ways of writing, thinking, might never fit into your ecosystem, even though it might seem great in theirs. One thing people don’t understand is that, not every shiny object is compatible with one’s ecosystem. ...
ONNX Runtime: Unleash On-Device AI
Tl;dr ONNX Runtime is your cross-platform, hardware-accelerated inference engine to deploy fast and private AI anywhere. What even is ONNX? ONNX Runtime (pronounced like “Onix”, the rock-snake Pokémon) is a performance-focused scoring and inference engine for models in the Open Neural Network Exchange (ONNX) format, designed to support heavy workloads and deliver fast, reliable predictions in production scenarios. Developed by Microsoft, it provides APIs in Python, C++, C#, Java, and JavaScript, and runs on Windows, Linux, and macOS. Its core is implemented in C++ for speed, and it seamlessly interoperates with ONNX-exported models from PyTorch, TensorFlow, Keras, scikit-learn, LightGBM, XGBoost, and more. ...
Infinite Iterations
I tend to keep making new blog websites, updating the tech stack ‘cause I find something better, and just keep updating the repo. Yet, in all iterations, I always have a stock homepage which says something on the lines of ‘Welcome to my site, I’m still busy building this blog. Thank you for your patience’. The idea of starting a blog gives you the feeling of being productive and gets you going, yet I never happen to start ’the writing’. Funnily, this is the first post I’m writing here, brain dumping my problem of actually writing posts. ...
Building Neural Nets from First Principles
I often keep trying to implement architectures, and papers from first principles or ‘from-scratch’ as it’s more commonly known today. Instead of creating multiple ’nn_test’ colab notebooks, I decided to begin adding them to a repo, not just to have them on my git, but also as a quick refresher I can keep looking back to. A screen grab of my self-masked attention implementation ...
Ferret
Ferret is a python cli-tool that lets you ‘ferret through’ project repositories, analyze their directory structure and get AI based explanations on them. Features List files and folders as a tree Explain the directory structure using an LLM (OpenAI API) (Gemini/Claude/HF coming soon) Generate module import dependency graph as a Mermaid chart To view the code and docs, check my git: https://github.com/abishekpadaki/ferret
My Eval for Reasoning LLMs
(last updated: 11th Aug 2025) There are a ton of benchmarks already present for reasoning llms, but I have a very simple one (just a single prompt for now) that I use, every time a new reasoning model is released. It’s a logical pattern discovering question based on Pokemon and various generations’ starters. The prompt goes as follows: This is a puzzle for you to solve the missing Pokemon based on a pattern. Figure out the pattern in the examples I provide by thinking carefully and hard. Here are the examples to begin with: ...