The problem

In this blog, I’ll describe how I built a classical computer vision tool for tracking microscopic 2-dimensional flakes, using virtually zero annotated data. The code is available online. Here’s what I mean by microscopic flakes:

One example of an input video feed, straight from the microscope.

This video shows a transfer of a single microscopic flake to a substrate (fancy name for a floor). If you look at this video from the side, you’d see something like below:

The camera starts by looking through a plastic sheet that has a flake attached to it. The sheet is transparent and out of focus, so it is very blurry initially. The plastic sheet is then lowered to the substrate and it then becomes visible. This step looks like this:

At this point, the flake sticks to the substrate and stays there. The sheet is gently peeled off, leaving the flake in place. This looks like so:

You can also read more about this process in this paper. To actually make something useful, you will need to repeat this flake transfer multiple times and build really complex, tiny structures. The unfortunate hard part is that all of this is a really slow, manual process that requires a lot of patience.

The solution

The solution is to automatically track the flakes from the camera feed and use that data to automatically guide a robotic arm to assemble a structure. You could then leave the assembly robot overnight and come back to complete structures.

One (unworkable) approach is to try fine-tuning an AI model, like YOLO, to detect the flake for you. This didn’t work because we only had 3 really short videos of the above transfer process.

Because we don’t have enough data, we are left in the ‘classical’ mode of computer vision: frame-differencing, edge detection, et al. So, the algorithm we develop looks like this:

video = open('video.mp4')
# Extract frame as a `numpy` array from video feed, using `pyav`
background = blur(rolling_ball(video[0]))

for idx, frame in enumerate(video)[::3]: # every 3rd frame only
  frame = togray(down(frame)) - background # Downsample, grayscale and subtract background. 
  edges = binarydilate(canny(frame)) # detect edges and dilate to connect edge breaks. 
  blobs = fill_holes(watershed(edges)) # Fill holes to make blobs
  labels = labels(blobs) # Separate blobs get separate IDs

This is the basic layout of the code. However, as the plastic sheet touches the substrate, the background changes—it becomes lighter. When that happens, we simply invalidate and recalculate the background.

This is the core of the algorithm that allows us to track the nanoflakes with really good precision and virtually no data, as shown below:

Result of CV processing the video feed (notice how the color of the crystal’s ID ‘ID09-08’ turns to black when the contact line is fully covering it). Green lines from the crystal denote the closest distance to the contact line (also shown as a green curve)

Conclusion

Automating the detection and tracking of 2D material flakes using classical computer vision techniques enables efficient robotic assembly, even with minimal annotated data. By using edge detection and image manipulation the system can track the microscopic flakes in this transfer process. This approach reduces manual labor required for constructing complex 2D material structures and could be used for automated fabrication.