Python · gRPC · Protobuf · PyTorch · Docker

Concurrent gRPC Model Inference Server

A model-inference service with concurrent clients, cached predictions, batched inputs, and model updates.

Role

Backend + ML systems

Year

2025

Stack

Python · gRPC · Protobuf · PyTorch · Docker

Links

GitHub

Overview

The server defines a Protocol Buffers interface, handles remote prediction requests over gRPC, protects shared model/cache state with explicit locking, invalidates stale cache entries on model update, and serves concurrent clients through a thread pool.

Demo in progress...