A metal tube. 8mm in diameter. Aviation grade.
By the time it reaches the end of the line, it carries a dozen stamps across its surface. Material certificate. Heat treatment. Dimensional check. Final approval. Each one a record that a specific step happened — and passed.
The inspector’s job: read all of them. Reliably. In real time. While the tube is rotating in their hands.
Small text. Curved metal. Digits that look almost identical to each other. And somewhere nearby, a paper checklist to cross-reference against.
That’s not an edge case. That’s the daily reality for thousands of small manufacturers — the ones who can’t afford a six-figure machine vision setup, and don’t have an engineering team to maintain one.
Large factories solved this years ago. They have computer vision, AI recognition, whole departments dedicated to it.
Small manufacturers have a worker, a flashlight, and a checklist.
That gap is exactly what Inventor was built for.
The OCR Problem
There’s no shortage of OCR models. The problem: almost all of them were trained on paper.
Metal is different. Curved metal is harder still. Glare, scratches, shallow engravings, stamps pressed at odd angles — none of it looks like a scanned document.
After several weeks of testing, two directions emerged.
Custom model — built from scratch for metal surfaces.
- Tuned for a narrow symbol set (a one vs. a seven matters more than full sentences)
- Handled recorded video well
- Real-time on iPhone: not reliably enough
Apple OCR — it worked. Almost too well.
- Picked up everything in frame, including things visible for a fraction of a second
- Adjacent stamps merged into one word
- And yet: curved surfaces, glare, scratches — none of it slowed it down
- Ran flawlessly on iPhone and iPad
So we kept Apple OCR and taught it to look in the right direction. Added a layer of intelligence on top.
That’s the foundation the pipeline is built on.
Zones
A tube isn’t one flat surface. It has sections. It rotates. The same stamp can appear in frame three times in a single pass.
The zone model handles this. It segments the shape of the tube, separates distinct sections, and gives the detection pipeline a structured surface to work with — instead of a continuous blur of metal and markings.
Getting there was straightforward. What slowed things down was data.
The Data Reality
Clients say they have a lot of data.
Usually they mean gigabytes of process video. That’s not nothing — but it’s not the same as useful data.
For a model to work, the dataset needs variety:
- Different tubes
- Different surface conditions
- Different edge cases
Mid-tier manufacturers — our core audience — often can’t deliver that. Small staff. Limited active orders. The parts you need to balance the dataset simply aren’t on the line that week. Or someone would need to spend half an hour filming, and there’s no one free to do it.
We adjusted. Built storage and annotation into our own product cycle so models can stay operational while new data comes in gradually — on the client’s schedule, not ours.
Error Detection
Finding stamps is one problem. Catching errors in them is another.
Negative examples — defective or incorrectly marked parts — are rare by nature. You can’t build a reliable error-detection model from production runs alone. There simply aren’t enough failures to learn from.
There’s also the duplicate problem. A tube rotates. The same stamp appears in frame repeatedly. But sometimes the tube also carries genuine duplicates — or stamps that look nearly identical to each other. The system has to tell the difference.
The answer was multiple levels of analysis. Broad zones down to individual stamps, down to individual characters within a stamp. Layer by layer. Combined with a layout model — trained specifically on the spatial arrangement of objects — this is what produces the accuracy the inspection actually requires.
Not one model doing everything. A cascade of models, each responsible for a specific level of the problem.