2024.12.03

Real-time image object recognition solution based on Mediatek IoT Genio1200

On the MediaTek IoT Genio1200 platform, MediaTek offers a number of different software solutions that enable partners to deliver AI computing power through CPUs, GPUs, and APUs. When developing and deploying a wide range of machine learning, most of them will provide hardware acceleration for the deduction of their own models, and partners can also enable hardware acceleration of TensorFlot Lite models through the graphics processor.

MediaTek IoT Genio1200 board:

Genio1200 demo board

In the case of MediaTek IoT Yocto, there are currently three known methods (CPU, GPU, and APU)


The first is ARM NN, a set of open-source software that enables machine learning on ARM's hardware devices, bridging between the current common neural network framework Cortex-A CPU AND ARM Mali GPUs, and using the CPU to compute and deduce the model.


The second is GPU Neural Network Acceleration, which uses the OpenGL ES (OpenGL for Embedded Systems) compute shader on the device to infer the model.


The third is the APU Neural Network Acceleration (MediaTek Deep Learning Accelerator and Vision Processing Unit).


Let me introduce you to MediaTek's proprietary Deep Learning Accelerator, a powerful and efficient Convolutional Neural Network (CPN) accelerator that enables high AI benchmark results with high Multiply-Accumulate utilization (MAC) utilization, which integrates MAC units with memory function modules.


Before we get started, do you remember what MediaTek NeuroPilot is? For those of you who have forgotten, you can go back and learn about MediaTek NeuroPilot


NeuroPilot is at the heart of MediaTek's AI ecosystem. NeuroPilot allows partners to develop and deploy AI applications on edge devices with extreme efficiency. This makes a wide variety of AI applications run faster. In the future, partners will be able to use the Neuron compiler (ncc-tflite) within the NeuroPilot SDK to convert TFLite models into MediaTek's proprietary binaries (DLAs, Deep Learning Archives) for deployment on the Genio1200 platform. The resulting model is highly efficient, with reduced latency and a smaller memory footprint. The Neuron SDK also provides the Neuron Run-time API, which provides a set of APIs that can be called by partners from C/C++ programs to create runtime environments, parse compiled model files, and perform on-device neural network inference.






As you can see from the icon, the DLA file is a MediaTek proprietary model, which is a low-level binary file of MDLA (MediaTek Deep Learning Accelerator) and VPU (Vision processing unit) computing devices. Use ncc-tflite to convert TensorFlow lite models into DLA files that can be deduced on the APU and then used by image/object recognition applications.


Use a pre-written script to convert a TensorFlow Lite model into a DLA document with the following information:


root@i1200-demo:~# lsconvert_tensorflowLite_to_DLA.sh  demos  test.tfliteroot@i1200-demo:~# ./convert_tensorflowLite_to_DLA.sh[apusys][info]apusysSession: Seesion(0xaaaae26f9910): thd(ncc-tflite) version(2) log(0)root@i1200-demo:~# lsconvert_tensorflowLite_to_DLA.sh  demos  test.dla  test.tfliteroot@i1200-demo:~#




As you can see, GstInference is an open-source project that provides a framework for integrating deep learning inference into GStreamer. It can be used for inference in a wide range of deep learning architectures, and can also be used with practical programs to support custom architectures. This framework uses R2Inference, an abstraction layer in C/C++ that is used in various machine learning frameworks. A single C/C++ application can use R2Inference to use models on different frameworks. This is useful when performing inference with different hardware (accelerators for CPUs, GPUs, APUs). This drill is based on the framework in the figure to realize the application of real-time image recognition, and the DLA document that has just been converted is used to execute the deduction of image recognition.


Next, perform a walkthrough of pre-configured pins to implement image and object recognition.


root@i1200-demo:~# lsconvert_tensorflowLite_to_DLA.sh  labels_objectD.txt   test2.dlademos                             objectD.dla          test2.tfliteimage_classification.sh           object_detection.shlabels.txt                        test.tfliteroot@i1200-demo:~# ./image_classification.sh


The result will be displayed on the HDMI screen, and you can see that the object of the performance is the ballpoint pen



Proceed with the demo of object recognition.

root@i1200-demo:~# lsconvert_tensorflowLite_to_DLA.sh  labels_objectD.txt   test2.dlademos                             objectD.dla          test2.tfliteimage_classification.sh           object_detection.shlabels.txt                        test.tfliteroot@i1200-demo:~# ./object_detection.sh

 


As a result of the deduction, you can see that it is identified as a bottle



The result of the deduction can be seen as being identified as monitor



The result of the deduction can be seen to be identified as a chair



►Scenario application diagram

►Photos of display boards

►Scenario block diagram

►Core technical advantages

The dual-core AI processor unit (APU) handles AI-based tasks and supports Deep Learning, Neural Network acceleration, and computer vision applications.

►Solution specifications

CPU: Arm Cortex-A78 x4 Arm Cortex-A55 x4 GPU: Arm Mali-G57 MP5 APU: MediaTek AI Processor (dual core) Video processing: Video encoding 4K60fps HEVC/H.264 Video decoding 4K90fps AV1/VP9/HEVC/H.264 Software: Android/Yocto Linux/Ubuntu/NeuroPilot SDK Interface: HDMI 2.0 receiver (HDMI RX) PCIE3.0 USB3.1 GbE MAC ISP, 48MP@30fps/16MP+16MP@30fps