Felipe Gonçalves Diogo
Back to projects

MLAI — Edge AI Visual Inspection

A dual-purpose AI quality station running fully offline on a Raspberry Pi 4 — industrial anomaly detection and agricultural fruit grading in one SCADA-style dashboard

MLAI — Edge AI Visual Inspection

The Challenge

Build a production-grade visual inspection system that runs entirely on commodity edge hardware — no cloud, no internet, no telemetry — and is versatile enough to handle two very different domains from the same box: industrial quality control on manufactured parts and agricultural produce grading on a rover-like platform. The constraints were tight: a single Raspberry Pi 4 (8 GB) with the Camera Module 3 NoIR, end-to-end latency under 500 ms, at least 3 FPS on the live feed, one inference module active at a time, and a SCADA-style operator UI that could be trusted in a factory control room. Everything had to auto-start on boot, auto-restart on crash, and be installable by someone who had never touched machine learning before.

Raspberry Pi 4 Model B — the entire compute target. All ML inference runs here via TFLite + XNNPACK on the ARM Cortex-A72
Raspberry Pi 4 Model B — the entire compute target. All ML inference runs here via TFLite + XNNPACK on the ARM Cortex-A72

Two Modules, One Box

MLAI ships with two fully independent inference modules sharing the same infrastructure — camera capture, calibration, measurement pipeline, database, API, and web shell — but nothing else. Only one runs at a time; switching between them takes under 5 seconds. • INDUST — Industrial quality control using PaDiM anomaly detection (via Anomalib) trained on MVTec AD categories (bottle, metal nut, screw, and others, hot-swappable). For each frame it produces an anomaly score (0–1), a pixel-level heatmap overlaid on the live video, a PASS / WARN / FAIL verdict against a configurable threshold, and the defect's dimensional footprint in millimeters. • AGRO — Agricultural produce inspection using SSD MobileNet V2 for fruit detection (apple, orange, tomato — extendable) followed by a MobileNet V2 transfer-learned classifier for per-fruit quality grading (good / defective / unripe). Each detection is sized using classical CV contour analysis converted from pixels to millimeters via the camera calibration, and a live size histogram tracks the size distribution of what just passed under the camera. Both modules feed into the same SQLite history, expose the same REST and WebSocket contracts, and render through the same SCADA shell — just with different sub-pages.

Raspberry Pi Camera Module — the sole sensor input. MLAI is built around the Camera Module 3 NoIR (IMX708, 12 MP) connected via the CSI-2 ribbon
Raspberry Pi Camera Module — the sole sensor input. MLAI is built around the Camera Module 3 NoIR (IMX708, 12 MP) connected via the CSI-2 ribbon

System Architecture

Three systemd services working together, each with its own responsibility and its own restart policy: • mlai-engine — the Python inference daemon. Talks to picamera2, runs the active module's pipeline on each frame, writes results to SQLite, and pushes live frames + results to the API over a local IPC channel. • mlai-api — FastAPI (port 8000), REST endpoints for system health, module switching, camera configuration, and paginated history + stats per module (INDUST and AGRO keep their routes in separate files). A WebSocket at /ws/live streams frames, verdicts, bounding boxes, heatmaps and inference timings as fast as the Pi can produce them. • mlai-web — Next.js 16 (port 3000), App Router, React 19, Tailwind CSS 4, Recharts. Serves the SCADA dashboard, the INDUST live page (video + heatmap overlay + gauge + measurement card), the AGRO live page (bounding boxes + fruit cards + size histogram), history tables with CSV export, settings, a system health page, and a camera calibration wizard that walks the operator through a checkerboard capture. The data flow is direct and cheap: picamera2 at 640×480 ~5 FPS → undistort using the camera calibration JSON → route to the active module → INDUST resizes to 256×256 and runs PaDiM, or AGRO resizes to 320×320 and runs SSD MobileNet V2 + MobileNet V2 classifier per detection → result object → persisted to SQLite + pushed over the WebSocket. Live UI updates are tied to the WebSocket; history pages use plain REST + React Query.

The Measurement Pipeline

Both modules share a single dimensional measurement pipeline that turns pixels into millimeters with field-verifiable accuracy: [camera frame] → [undistort via intrinsic matrix] → [detect ROI] → [segment with OpenCV contours] → [pixel dimensions] → [px → mm conversion] → [output]. Calibration is a guided, checkerboard-based OpenCV routine exposed through a step-by-step wizard in the web UI. The operator prints a checkerboard pattern, holds it at varied angles in front of the camera, and the engine collects enough views to solve for the camera's intrinsic matrix and distortion coefficients, which are saved to config/camera_calibration.json. From that point on every frame is undistorted before inference, and every contour measurement carries a real-world scale. Target accuracy is ±2 mm and the system is designed to be recalibrated on the fly if the lens or camera distance changes.

Pi 4 side view — GPIO, USB 3.0 and CSI camera port. The Camera Module 3 NoIR connects via the CSI-2 ribbon
Pi 4 side view — GPIO, USB 3.0 and CSI camera port. The Camera Module 3 NoIR connects via the CSI-2 ribbon

The ML Story

Training happens on a separate PC using a Miniconda env and the TensorFlow ecosystem; inference happens on the Pi using only tflite-runtime with XNNPACK acceleration. The split matters — the Pi does not need TensorFlow installed, only a small runtime that loads .tflite files. • INDUST — Trained with Anomalib's PaDiM on MVTec AD. One model per category. The training CLI downloads the MVTec data, fits the model in a couple of epochs, and exports a TFLite artifact that the Pi loads through a thin wrapper. A single command trains on "bottle", exports, and copies into models/indust/ ready to be served. • AGRO detector — SSD MobileNet V2 fine-tuned with TF Model Maker on Fruits-360 for apple, orange, and tomato. A pretrained COCO-SSD TFLite can also be dropped in directly when time is scarce. • AGRO quality classifier — MobileNet V2 transfer-learned on a fresh-vs-rotten fruit dataset, producing a lightweight three-class head (good / defective / unripe) running at ~50–80 ms per detection on the Pi. All training scripts are written for a developer with zero prior ML experience — every step is commented end-to-end, conda commands are spelled out, and a README walks through dataset download, train, export, and scp-to-Pi in order.

Performance on Pi 4 (8 GB)

The entire stack is sized to stay within a single Pi 4's budget: • End-to-end latency < 500 ms (PaDiM ~300–500 ms, SSD MobileNet V2 ~100–150 ms, MobileNet V2 classifier ~50–80 ms). • Live feed ≥ 3 FPS on the WebSocket. • RAM footprint for all three services < 4 GB, leaving headroom for capture buffers and the web UI. • CPU averages under 80% across the four A72 cores during inference. • Model swap from INDUST → AGRO (or the reverse) in under 5 seconds, via POST /api/system/module. • SQLite queries stay under 50 ms thanks to WAL mode and a small number of well-indexed tables (system_state, indust_results, agro_results, agro_detections). • A capture auto-prune keeps disk usage below 5 GB with a 30-day rolling window; the full system survives a 1-hour continuous run with no memory leaks and no crashes. A dedicated scripts/benchmark.py measures all of these on any given Pi so performance claims are reproducible, not aspirational.

The SCADA Operator UI

The design language is deliberate and opposite of generic AI: factory control room, not chatbot. Dark background around #0f1117 with a subtle grid texture, status colors borrowed from industrial HMIs (green for OK, amber for warn, red for fault, blue for info), monospaced typography for values and sans-serif for labels, blinking status dots, live FPS counters, gauge animations, timestamps on every reading. The navigation splits along the two modules: INDUST pages (live view with heatmap overlay + anomaly gauge + verdict + measurements + category and threshold controls, history table with detail modal and CSV export, settings) and AGRO pages (live view with bounding boxes + count + per-fruit cards + size histogram, history, settings). A system page surfaces CPU / RAM / temperature gauges and per-service uptime, and a calibration wizard guides the operator through the checkerboard intrinsic-matrix routine. Charts are built with Recharts, the live canvas is plain HTML5 canvas driven by the WebSocket — no heavy video pipeline.

Tech Stack

Every piece is open-source and intentional: • OS — Raspberry Pi OS Bookworm 64-bit (required for TFLite). • Camera — libcamera + picamera2, Camera Module 3 NoIR (IMX708, 12 MP). • CV — OpenCV 4.10 (headless on the Pi) for undistort, contour segmentation and px→mm conversion. • ML training (PC) — Miniconda + TensorFlow 2.16/2.18, Anomalib 1.1, TF Model Maker, Keras. • ML inference (Pi) — tflite-runtime with XNNPACK acceleration; zero full-TF on the Pi. • API — FastAPI 0.115, Pydantic 2, Uvicorn. • Realtime — FastAPI native WebSocket at /ws/live. • DB — SQLite 3 with WAL mode. • Frontend — Next.js 16 (App Router), React 19, Tailwind CSS 4, Recharts, shadcn/ui. • Process management — systemd units with Restart=always and graceful shutdown hooks. Setup is automated through two shell scripts (scripts/download_models.sh, systemd install) and a detailed SetupGuide.md aimed at a reader who has never used the terminal — from flashing the SD card through Raspberry Pi Imager to opening the dashboard in a LAN browser.

Results

A complete, self-contained visual inspection platform that turns a $75 Raspberry Pi into a dual-purpose AI quality station — industrial anomaly detection and agricultural fruit grading — with zero cloud dependencies and full data sovereignty. The operator never gives up their data to a third party, never pays a subscription, and never needs an internet connection to run inference. Architecturally the system proves that edge-first, module-separated design is viable for heterogeneous ML workloads on commodity hardware: INDUST and AGRO share only infrastructure (camera, measurement, API contract, UI shell), which keeps their code and their ML lifecycles independent while still presenting a unified experience to the operator. The same codebase will extend naturally to additional modules — the next planned iteration bridges to a VPS Mosquitto broker over MQTT so multiple field Pis can feed a central supervisory dashboard without losing local autonomy.