
performing real-time image recognition on a Cortex-M7 processor using Arm's CMSIS-NN library.
processor using Arm's CMSIS-NN library.
This highlights the fact that you don't need a high-spec machine or cloud compute to do real-time ML tasks – we've made it so you can do them fast and efficiently on embedded devices.
cloud compute to do real-time machine learning tasks – we've made it
so you can do them fast and efficiently
on embedded devices.
We'll briefly touch on all the steps that we took to get this up and running
And we've made all the files available on github here
for you to do this yourself
in a Linux environment.
So here we're using a STM32F7 development board
and we've got an ST camera connected.
As with any peripheral, we'll need the basic program for the camera
to interact with the board - we'll come back to this later on.
As we're only interested in optimizing our model with CMSIS-NN,
the steps we're going to show here apply to any board with a Cortex-M processor.
So the first step in our demo is selecting and training a model.
And for a model to fit and run on a constrained device, it needs to be small.
So here we're using the CIFAR10 dataset
which has been trained with Caffe, available here.
The CIFAR-10 dataset consists of sixty thousand
32x32 colour images in these 10 classes.
We're using a 3-layer convolutional neural network,
which we can see illustrated here. We've already trained our model.
so we're just going to show you how to get it to run on a Cortex-M device
It's important to note that CMSIS-NN's optimizations
makes use of SIMD instructions.
And because only Cortex-M4 andM7 cores support SIMD,
you'll only see the performance benefits with these two cores.
The next step in the process is to quantize the model.
Now this is a key step for being able to deploy a model on a resource constrained
device like a microcontroller, as it greatly reduces the size
of the model by converting the 32-bit floating point model
to an 8-bit fixed point model as well as improving
the overall compute performance.
This only has a very small impact on the accuracy of the model.
From 80.3% accuracy when unquantized
to 79.9% accuracy after quantization.
So here we navigate to the directory that we've downloaded from github,
and then in the CMSIS-NN folder we see all the scripts
and files that you need to generate a quantized model that you can deploy.
Using this command here
we're running this python script
to quantize our CFAR10 model for Cortex-M7
Here we've specified the weights and also the location
of where we'll save the quantized model. Now we run that,
and this is a good time to take a break and do something else, as it can take a couple of hours
to complete on just a CPU
Now the script parses the network graph connectivity, and then
finds the right quantization parameters to quantize.
So when the quantization has completed,
we need to transform the model operations and network graph connectivity
and generate the code we need consisting of neural network function calls.
Essentially we're transforming the model from a Caffe format
to a C format.
So we run the transform python script on our quantized model.
Here we specify an output directory
and this generates these files.
So let's take a quick look in our main file,
and we can see that the transformation has defined all of the layers needed for this particular network,
and it's generated all of these function calls
that call the CMSIS-NN library functions.
These functions are for the different layers extracted from our trained Caffe model with the different weights
And then here we have a mock main function.
This shows the run_nn call, that runs all of the
different layers and saves the output on the buffer.
But because it's a mock function, we need to incorporate this function into code
that actually captures images from the camera and displays them on the screen.
So for simplicity, we've combined part of the c program,
he run_nn function, with the basic program
that comes with the camera.
And if you want to run this with a different application,
you'll need to do this with the program that you use.
After we've combined the code,
we run a make file as a way of compiling the combined code.
So from this, we've generated two files, a hex file and a bin file.
And we can see both of these in our build folder.
And now the final step is to upload one of these files to the board
and then we can see the program in action.
So here, we've got the final program running on the board
and we can see the image classification in real-time
on our Cortex-M7 board and it's performing very consistently with a high degree of accuracy
And we can that the time taken to compute the result of an image through the network
is very fast.
So again, we've made all of the scripts available that you need to quantize your model and
converting it to C code
here on GitHub
And for detailed step-by-step instructions on how to run this demo yourself,
you can read the guide that we've put together, on our developer website by
following the link in the description.
Code Radio 🎧 + 💻 24/7 concentration music for programmers 🔥 jazzy beats from freeCodeCamp.org What is Arm Leading Edge? Partner interviews with Toradex and AAEON Arm Demo - always-connected PCs (Wo)man vs Machine Learning: Who Can Win the Image Classification Race? Intelligent Connectivity at Arm - The Kigen Family Arm eSIM Workshop Arm Partner Demo - Ampere Intelligent Connectivity at Arm - The Pelion IoT Platform Intelligent Connectivity at Arm – from IoT over smartphones to large screen compute devices Intelligent Connectivity at Arm - The Neoverse Platforms