Skip to main content

Model Deployment

1 Performance Testing

[Note] Before formally deploying the AI model, we strongly recommend that you conduct performance testing of the relevant model on the chip side to ensure that the inference performance of the current model meets expectations.

The spacemit-ort/bin/onnxruntime_perf_test tool in the SDK directory supports quickly testing the pure inference performance of the AI algorithm model on the chip side. This tool is compatible with ONNX models, so you can easily use it to evaluate the performance of the original ONNX floating-point model, as well as the converted (and/or quantized) ONNX fixed-point model.

1.2 Usage Instructions

$ onnxruntime_perf_test -h
perf_test [options...] model_path [result_file]
Options:
-m [test_mode]: Specifies the test mode. Value could be 'duration' or 'times'.
Provide 'duration' to run the test for a fix duration, and 'times' to repeated for a certain times.
-M: Disable memory pattern.
-A: Disable memory arena
-c [parallel runs]: Specifies the (max) number of runs to invoke simultaneously. Default:1.
-e [cpu|spacemit]: Specifies the provider 'cpu', 'pacemit'. Default:'cpu'.
-r [repeated_times]: Specifies the repeated times if running in 'times' test mode.Default:1000.
-t [seconds_to_run]: Specifies the seconds to run for 'duration' mode. Default:600.
-p [profile_file]: Specifies the profile name to enable profiling and dump the profile data to the file.
-s: Show statistics result, like P75, P90. If no result_file provided this defaults to on.
-S: Given random seed, to produce the same input data. This defaults to -1(no initialize).
-v: Show verbose information.
-x [intra_op_num_threads]: Sets the number of threads used to parallelize the execution within nodes, A value of 0 means ORT will pick a default. Must >= 0.
-y [inter_op_num_threads]: Sets the number of threads used to parallelize the execution of the graph (across nodes), A value of 0 means ORT will pick a default. Must >= 0.
-f [free_dimension_override]: Specifies a free dimension by name to override to a specific value for performance optimization. Syntax is [dimension_name:override_value]. override_value must > 0
-F [free_dimension_override]: Specifies a free dimension by denotation to override to a specific value for performance optimization. Syntax is [dimension_denotation:override_value]. override_value must > 0
-P: Use parallel executor instead of sequential executor.
-o [optimization level]: Default is 99 (all). Valid values are 0 (disable), 1 (basic), 2 (extended), 99 (all).
Please see onnxruntime_c_api.h (enum GraphOptimizationLevel) for the full list of all optimization levels.
-u [optimized_model_path]: Specify the optimized model path for saving.
-z: Set denormal as zero. When turning on this option reduces latency dramatically, a model may have denormals.
-T [Set intra op thread affinities]: Specify intra op thread affinity string
[Example]: -T 1, 2; 3, 4; 5, 6 or -T 1 - 2; 3 - 4; 5 - 6
Use semicolon to separate configuration between threads.
E.g. 1, 2; 3, 4; 5, 6 specifies affinities for three threads, the first thread will be attached to the first and second logical processor.
The number of affinities must be equal to intra_op_num_threads - 1
-D [Disable thread spinning]: disable spinning entirely for thread owned by onnxruntime intra - op thread pool.
-H: Maximum value to produce the random input data. This defaults to -1(as std::numeric_limits<T>::max() whenever the value given by this option less than value of '-L').
-L: Minimum value to produce the random input data. This defaults to 0.
-R: Count of random generated input test data. This defaults to 1 and must > 0.
-U: Maximum value to produce the random value of free dimensions which are not overriden. This defaults to 1. Specified value must > 0.
-V: Minimum value to produce the random value of free dimensions which are not overriden. This defaults to 1. Specified value must > 0.
-Z [Force thread to stop spinning between runs]: disallow thread from spinning during runs to reduce cpu usage.
-h: help

1.2 Parameter Description

ParameterNecessary/OptionalDefault ValueDescription
-mOptionaltimesTest mode: fixed test duration (s) or test times (Note: The original onnxruntime_perf_test tool defaults to the 'duration' mode)
-MOptionalNoneDisable memory pattern
-AOptionalNoneDisable memory arena
-cOptional1Number of parallel inferences (the number of session.run() triggers at the same time)
-eOptionalcpuProvider(s) for inference, separated by " " when used. The currently available EPs include: {cpu
-rOptional1000Number of model inference tests in the fixed test times mode (for each session)
-tOptional600Model inference test time in the fixed test duration mode (for each session), unit: seconds
-pOptionalNoneProfiling file path (default: disabled, non-empty: enabled)
-sOptionalONPrint inference time statistics information (if the result file is not specified, it is enabled by default)
-SOptional-1Random seed (default -1 means no random initialization of test data, 0 random seed, > 0 user-specified random seed)
-vOptionalNoneEnable debugging information
-xOptional0Number of parallel threads within a single operator (default 0, i.e., the internal mechanism of onnxruntime)
-yOptional0Number of concurrent execution threads for multiple operators (default 0, i.e., the internal mechanism of onnxruntime)
-fOptionalNoneSpecify the value of the free dimension in the model input by the parameter name (string: see abs_free_dimensions.onnx), format key:value
-FOptionalNoneSpecify the value of the free dimension in the model input by the notation name (string: see abs_free_dimensions.onnx), format key:value
-POptionalNoneEnable parallel execution mode
-oOptional99Model optimization level
-uOptionalNoneSave path of the optimized model
-zOptionalNoneSame as session_options.AddConfigEntry(kOrtSessionOptionsConfigSetDenormalAsZero, "1")
-TOptionalNoneSpecify the affinity of the threads in the internal thread pool of onnxruntime
-DOptionalNoneCompletely disable the idling of the threads in the thread pool related to the concurrent computing within the onnxruntime operator
-HOptional-1Maximum value for randomly generated test data (if less than the minimum value, the maximum value of the corresponding data type is used by default)
-LOptional0Minimum value for randomly generated test data
-ROptional1Number of groups of randomly generated test data
-UOptional1Maximum random data value for free dimensions (usually batch size)
-VOptional1Minimum random data value for free dimensions (usually batch size)
-ZOptionalNoneForbid the thread from idling during runs to reduce CPU utilization
-h, --helpOptionalNonePrint the usage instructions

1.3 Usage Example

Taking the onnxruntime/test/testdata/abs_free_dimensions.onnx model as an example:

1.3.1. Random Test Data

Fix the test times to 100, randomly generate 10 groups of test data, and fix the random seed to 1, the maximum random test data to 6, and the minimum random test data to 2

$ MODEL = abs_free_dimensions.onnx
$ ARGS = "${MODEL} ${MODEL%.onnx}.txt -m times -r 100 -R 10 -S 1 -H 6 -L 2"
$ onnxruntime_perf_test ${ARGS}
...
Session creation time cost: 0.0455992 s
First inference time cost: 0 ms
Total inference time cost: 0.00371454 s
Total inference requests: 100
Average inference time cost: 0.0371454 ms
Total inference run time: 0.00417042 s
Number of inferences per second: 23978.4
...

2 Application Development

2.1 AI Support Library

2.1.1 Demo Introduction

The current Support Library Demo is located in the bianbu-ai-support directory in the deployment toolkit, and the relevant instructions and examples are as follows:

$ tree -L 3 /opt/spacemit-ai-sdk.v1.1.0/bianbu-ai-support/
/opt/spacemit-ai-sdk.v1.1.0/bianbu-ai-support/
├── bin // Precompiled executable programs
│ ├── classification_demo
│ ├── detection_demo
│ ├── detection_stream_demo
│ ├── detection_video_demo
│ ├── estimation_demo
│ └── tracker_stream_demo
├── demo // Demo cmake project
│ ├── CMakeLists.txt
│ ├── README.md
│ ├── build.sh // Quick compilation (and testing) script
│ ├── dataloader.hpp
│ ├── image_classification_demo.cc
│ ├── object_detection.hpp
│ ├── object_detection_demo.cc
│ ├── object_detection_stream_demo.cc
│ ├── object_detection_video_demo.cc
│ ├── pose_estimation.hpp
│ ├── pose_estimation_demo.cc
│ ├── pose_tracker_stream_demo.cc
│ └── utils
│ ├── cv_helper.hpp
│ ├── json.hpp
│ ├── json_helper.hpp
│ └── win_getopt
├── include // Preprocessing, postprocessing, auxiliary function and other modules
│ └── bianbuai
│ ├── task
│ └── utils
├── lib
│ ├── 3rdparty // Third-party dependency libraries
│ │ └── opencv4
│ ├── libbianbuai.so -> libbianbuai.so.1
│ ├── libbianbuai.so.1 -> libbianbuai.so.1.0.15
│ └── libbianbuai.so.1.0.15
└── share
└── ai-support // Pre-set resource data
├── imgs
├── models
└── videos
16 directories, 24 files

2.1.2 Demo Compilation

Cross-compilation

Cross-compilation is mainly applicable to the PC side (e.g. x86_64 development environment), and the process (example) is as follows:

# Specify the path of the spacemit-ai-sdk
$ SDK = ${PATH_TO_SPACEMIT_AI_SDK} # e.g. /opt/spacemit-ai-sdk.v1.1.0
# Specify the environment variables related to cross-compilation
$ CROSS_TOOL = $SDK/spacemit-gcc/bin/riscv64-unknown-linux-gnu-
$ SYSROOT = $SDK/spacemit-gcc/sysroot
$ BIANBUAI_HOME = $SDK/bianbu-ai-support
$ ORT_HOME = $SDK/spacemit-ort
$ OPENCV_DIR = $SDK/bianbu-ai-support/lib/3rdparty/opencv4/lib/cmake/opencv4
# Create the cmake working directory and compile the demo
$ cd ${BIANBUAI_HOME}/demo
$ mkdir build && pushd build
$ cmake.. -DBIANBUAI_HOME=${BIANBUAI_HOME} -DORT_HOME=${ORT_HOME} -DOpenCV_DIR=${OPENCV_DIR} -DCMAKE_C_COMPILER=${CROSS_TOOL}gcc -DCMAKE_CXX_COMPILER=${CROSS_TOOL}g++ -DCMAKE_SYSROOT=${SYSROOT}
$ make -j4
$ popd
Local Compilation

Local compilation is applicable to the chip side, and the process (example) is as follows:

# Specify the environment variables related to local compilation
$ CROSS_TOOL =
$ SYSROOT =
$ BIANBUAI_HOME = $SDK/bianbu-ai-support # Specify the version in the latest sdk or the /usr directory
$ ORT_HOME = $SDK/spacemit-ort # Specify the version in the latest sdk or the /usr directory
$ OPENCV_DIR = # Specify the version in the latest sdk or automatically find it through find_package
# Create the cmake working directory and compile the demo
$ cd ${BIANBUAI_HOME}/demo
$ mkdir build && pushd build
$ cmake.. -DBIANBUAI_HOME=${BIANBUAI_HOME} -DORT_HOME=${ORT_HOME} -DOpenCV_DIR=${OPENCV_DIR} -DCMAKE_C_COMPILER=${CROSS_TOOL}gcc -DCMAKE_CXX_COMPILER=${CROSS_TOOL}g++ -DCMAKE_SYSROOT=${SYSROOT}
$ make -j4
$ popd

[Note] The above-related content has been pre-set in the demo/build.sh quick compilation script. You can quickly modify the relevant configuration (such as: ORT_HOME and other variables) by editing the demo/build.sh script. At that time, you can quickly verify the demo compilation by the bash build.sh (cross-compilation) and bash build.sh --native (local compilation) commands.

Quick Compilation
# One-click cross-compilation (e.g. spacemit-ai-sdk.v1.1.0 docker environment)
$ cd /opt/spacemit-ai-sdk.v1.1.0/bianbu-ai-support/demo
$ bash build.sh

2.1.3 Demo Running

  • Simulation Configuration For the cross-compiled demo program, you can use the pre-installed qemu - riscv64 tool in the deployment tool to achieve simulation running on the PC side. The relevant configuration is as follows:
$ QEMU_CMD = "$SDK / spacemit - qemu / bin / qemu - riscv64 - L $SYSROOT"
  • Running Example

[Note] For the locally compiled demo program, you do not need to configure any environment variables.

# Create softlink to test resource if necessary
$ ln - sf ${BIANBUAI_HOME} / rootfs / usr / share / ai - support data
# Smoke test with image classification
$ env LD_LIBRARY_PATH = ${ORT_HOME} / lib:$LD_LIBRARY_PATH ${QEMU_CMD} \
build / classification_demo data / models / squeezenet1.1 - 7.onnx data / labels / synset.txt data / imgs / dog.jpg
# Smoke test with object detection
$ env LD_LIBRARY_PATH = ${ORT_HOME} / lib:$LD_LIBRARY_PATH ${QEMU_CMD} \
build / detection_demo data / models / nanodet - plus - m_320.onnx data / models / coco.txt data / imgs / person.jpg result0.jpg

[Note] The above-related content has also been pre-installed in the demo / build.sh quick compilation script. You can quickly run the above example (simulation test in the x86_64 docker environment) by the bash build.sh -- test command:

[INF0] Building demos done. [INFO]Prepare... [INFO] Smoke test with image classification task
[INF0] Run:bld / classificat ion_demo data / mode ls / squeezenet1.1 - 7.onnx data / mode ls / synset.txt data / imgs / dog.jpg open tcm device failed(- 1) Enable spacemit ep now tcm heck param err--->fun:tcmmalloc_sync + line:164Classfy result:n02113023 Pembroke, Pembroke Welsh corgi [INFO] Smoke test with object detection task... [INF0] Run: bld / detection_demo data / mode s / nanodet - plus - m_320.onnx data / models / coco.txt data / imgs / person.jpg resulto.jpg open t

2.1.4 Demo Instructions

  • classification_demo

Single-image image classification demo, input the path of a single image, and output the category of the image.

  • Running Method
$ classification_demo 
Usage:
classification_demo <model_path> <label_path> <image_path>
classification_demo <config_path> <image_path>
  • Parameter Description
ParameterRequired/OptionalDefault ValueRemarks
model_pathRequiredNoneModel file path
label_pathRequiredNoneLabel file path
config_pathRequiredNoneConfiguration file path
image_pathRequiredNoneImage file path
  • detection_demo

Single-image object detection demo, input the address of a single image and the address to save the image, output the box information and save the framed image to the target image location.

  • Running Method
$ detection_demo 
Usage:
detection_demo <model_path> <label_path> <image_path> <save_path>
detection_demo <config_path> <image_path> <save_path>
  • Parameter Description
ParameterRequired/OptionalDefault ValueRemarks
model_pathRequiredNoneModel file path
label_pathRequiredNoneLabel file path
config_pathRequiredNoneConfiguration file path
image_pathRequiredNoneImage file path
save_pathRequiredNoneSaved image file path
  • detection_stream_demo

Video stream object detection demo, you can input a video file or access the camera and display the framed picture in real time.

  • Running Method
$ detection_stream_demo 
Usage:
detection_stream_demo [-h <resize_height>] [-w <resize_width>] [-f] <model_path> <label_path> <input>
detection_stream_demo [-h <resize_height>] [-w <resize_width>] [-f] <config_path> <input>
  • Parameter Description
ParameterRequired/OptionalDefault ValueRemarks
model_pathRequiredNoneModel file path
label_pathRequiredNoneLabel file path
config_pathRequiredNoneConfiguration file path
inputRequiredNoneInput content
-wOptional320Resized width
-hOptional320Resized height
-fOptionalNoneHorizontal flip
  • detection_video_demo

Video object detection demo, input the address of the video file, will output real-time box information and save the framed video (avi format) to the target address.

  • Running Method
$ detection_video_demo 
Usage:
detection_video_demo <model_path> <label_path> <video_path> <save_path>(*.avi)
detection_video_demo <config_path> <video_path> <save_path>(*.avi)
  • Parameter Description
ParameterRequired/OptionalDefault ValueRemarks
model_pathRequiredNoneModel file path
label_pathRequiredNoneLabel file path
config_pathRequiredNoneConfiguration file path
video_pathRequiredNoneVideo file path (mp4, avi)
save_pathRequiredNoneSaved video file path
  • estimation_demo

Single-image pose estimation demo, input the address of a single image and the address to save the image, and save the image with points drawn to the target image location.

  • Running Method
$ estimation_demo 
Usage:
estimation_demo <detection_model_path> <detection_label_path> <pose_point_model_path> <image_path> <save_path>
estimation_demo <detection_config_path> <pose_point_config_path> <image_path> <save_path>
  • Parameter Description
ParameterRequired/OptionalDefault ValueRemarks
detection_model_pathRequiredNoneObject detection model file path
detection_label_pathRequiredNoneObject detection label file path
pose_point_model_pathRequiredNonePose model file path
detection_config_pathRequiredNoneObject detection model configuration file path
pose_point_config_pathRequiredNonePose model configuration file path
image_pathRequiredNoneImage file path
save_pathRequiredNoneSaved image file path
  • tracker_stream_demo

Video stream pose tracking demo, you can input a video file or access the camera and display the framed picture in real time.

  • Running Method
$ tracker_stream_demo 
Usage:
tracker_stream_demo [-h <resize_height>] [-w <resize_width>] [-f] <detection_model_path> <detection_label_path> <pose_point_model_path> <input>
tracker_stream_demo [-h <resize_height>] [-w <resize_width>] [-f] <detection_config_path> <pose_point_config_path> <input>
  • Parameter Description
ParameterRequired/OptionalDefault ValueRemarks
detection_model_pathRequiredNoneObject detection model file path
detection_label_pathRequiredNoneObject detection label file path
pose_point_model_pathRequiredNonePose model file path
detection_config_pathRequiredNoneObject detection model configuration file path
pose_point_config_pathRequiredNonePose model configuration file path
inputRequiredNoneInput content
-wOptional320Resized width
-hOptional320Resized height
-fOptionalNoneHorizontal flip

2.1.5 Description of Environment Variables

Environment Variable NameRemarks
SUPPORT_SHOW(stream demo) -1 indicates not to display
SUPPORT_SHOWFPS(stream demo) If there is content, the fps will be displayed
SUPPORT_PROFILING_PROJECTSThe address of the generated profile file
SUPPORT_LOG_LEVELThe range is 0 - 4
SUPPORT_GRAPH_OPTIMIZATION_LEVELGraph optimization level (ort_disable_all, ort_enable_basic, ort_enable_extended, ort_enable_all)
SUPPORT_OPT_MODEL_PATHThe path of the optimized model
SUPPORT_DISABLE_SPACEMIT_EP1 indicates to disable spacemit - ep
SUPPORT_OPENCV_THREAD_NUMThe number of threads used by opencv (>= 4.x)

2.2 AI Engine

2.2.1 Introduction

SpacemiT - ORT includes the basic inference framework of ONNXRuntime (v1.15.1) and the SpaceMITExecutionProvider acceleration backend (hereinafter referred to as EP), and its usage is almost the same as the public version of ONNXRuntime.

2.2.2 QuickStart

  • C & C++
#include <onnxruntime_cxx_api.h> 
#include "spacemit_ort_env.h"
Ort::Env env(ORT_LOGGING_LEVEL_WARNING, "ort - demo") ;
Ort::SessionOptions session_options;
// Set the number of inference threads
//int64_t num_threads = 2;
//session_options.SetIntraOpNumThreads(num_threads);
std::unordered_map<std::string, std::string> provider_options;
// provider_options["SPACEMIT_EP_DISABLE_OP_TYPE_FILTER"] = "OPA;OPB;OPC"; Disable EP from inferring certain OP types, node.op
// provider_options["SPACEMIT_EP_DISABLE_OP_NAME_FILTER"] = "OPA;OPB;OPC"; Disable EP from inferring certain named OPs, node.name
SessionOptionsSpaceMITEnvInit(session_options, provider_options); // Optional loading of SpaceMIT environment initialization
Ort::Session session(env, net_param_path, session_options);
//...Subsequent steps are consistent with the public version of ORT
  • Python
# Install using the whl package
# pip install spacemit_ort - *.whl
# On the riscv64 platform, if a warning is encountered, add -- break - system - packages
# The whl package strips the automatic installation of dependent libraries, and numpy needs to be installed separately
# For the riscv64 platform, use the command apt install python3 - numpy to install
import onnxruntime as ort
import numpy as np
import spacemit_ort
eps = ort.get_available_providers() #
net_param_path = "resnet18.q.onnx"
sess_options = ort.SessionOptions()
# Set the number of threads
# sess_options.intra_op_num_threads = 2
# Set the log level
# sess_options.log_severity_level = 1
# Session with ep
session = ort.InferenceSession(net_param_path, sess_options, providers = ["SpaceMITExecutionProvider"])
# Session without ep
# Because there are 2 EPs, it needs to be specifically specified
ref_session = ort.InferenceSession(net_param_path, sess_options, providers = ["CPUExecutionProvider"])
input_tensor = np.ones((1, 3, 224, 224), dtype = np.float32)
input_name = session.get_inputs()[0].name
output_names = [output.name for output in session.get_outputs()]
outputs = session.run(output_names, {input_name: input_tensor})
ref_outputs = ref_session.run(output_names, {input_name: input_tensor})
# The error between outputs and ref_outputs is generally within 1e - 5

2.2.3 Custom Operators plugins

Use the method of extending custom operators in the native onnxruntime. For the original text, please refer to https://onnxruntime.ai/docs/reference/operators/add-custom-op.html

#include "onnxruntime_cxx_api.h" 
struct CustomKernel {
CustomKernel (const OrtKernelInfo* info);
void Compute(OrtKernelContext* context);
};
struct CustomOp : Ort::CustomOpBase<CustomOp, CustomKernel> {
explicit CustomOp ();
void* CreateKernel(const OrtApi&, const OrtKernelInfo*) const;
const char* GetName() const { return "custom op"; };
const char* GetExecutionProviderType() const { return "CPUExecutionProvider"; };
size_t GetInputTypeCount() const { return 1; };
ONNXTensorElementDataType GetInputType(size_t) const { return ONNX_TENSOR_ELEMENT_DATA_TYPE_UNDEFINED; };
OrtCustomOpInputOutputCharacteristic GetInputCharacteristic(size_t) const { return OrtCustomOpInputOutputCharacteristic::INPUT_OUTPUT_OPTIONAL; };
size_t GetOutputTypeCount() const { return 1; };
ONNXTensorElementDataType GetOutputType(size_t) const { return ONNX_TENSOR_ELEMENT_DATA_TYPE_UNDEFINED; };
OrtCustomOpInputOutputCharacteristic GetOutputCharacteristic(size_t) const { return OrtCustomOpInputOutputCharacteristic::INPUT_OUTPUT_OPTIONAL; };
};
// Declare the custom operator domain and add the custom operator to the session_options
static const char* c_OpDomain = "user.custom_domain";
Ort::CustomOpDomain domain{c_OpDomain};
static TestCustomOp CustomOp;
domain.Add(&TestCustomOp());
session_options.Add(domain);

2.2.4 Operator Accelerate List

Op TypeDomainVersionAttributesTypeNotesschema
Convai.onnx1, 11kernel_shape: limited to two dimensionsT: tensor(float)tensor(float16)QLinearConv
ConvTransposeai.onnx1, 11kernel_shape: limited to two dimensionsT: tensor(float)tensor(float16)QLinearConvTranspose
QlinearMatMulai.onnx10T1: tensor(int8)\nT2: tensor(int8)\nT3: tensor(int8)Only supports PerTensor quantization, only supports MatMul where B is a constant; weight quantization only supports symmetric quantizationhttps://onnx.ai/onnx/operators/onnx__QLinearMatMul.html
Gemmai.onnx1, 6, 7, 9, 11, 13alpha: limited to 1.0\nbeta: limited to 1.0T: tensor(float)https://onnx.ai/onnx/operators/onnx__Gemm.html
QGemmcom.microsoft1alpha: limited to 1.0\nbeta: limited to 1.0T: tensor(float)\nTA: tensor(int8)\nTB: tensor(int8)\nTC: tensor(int8)\nTYZ: tensor(int8)\nTY: tensor(int8)Only supports PerTensor quantization, only supports constant Gemm; weight quantization only supports symmetric quantizationhttps://github.com/microsoft/onnxruntime/blob/main/docs/ContribOperators.md#com.microsoft.QGemm
AveragePoolai.onnx1, 7, 10, 11, 19kernel_shape: limited to two dimensions\ncount_include_pad: limited to 1T: tensor(float)QLinearAveragePoolcom.microsoft
GlobalAveragePoolai.onnx1T: tensor(float)QLinearGlobalAveragePoolcom.microsoft1
MaxPoolai.onnx11, 12kernel_shape: limited to two dimensionsT: tensor(float)tensor(int8)
QuantizeLinearai.onnx10, 13, 19T1: tensor(float)\nT2: tensor(int8)tensor(int16)
DequantizeLinearai.onnx10, 13, 19T1: tensor(int8)tensor(int16)tensor(int32)\nT2: tensor(float)
Addai.onnx1, 6, 7, 13, 14T: tensor(float)QLinearAddcom.microsoft1
Subai.onnx1, 6, 7, 13, 14T: tensor(float)
Mulai.onnx1, 6, 7, 13, 14T: tensor(float)https://github.com/microsoft/onnxruntime/blob/main/docs/ContribOperators.md#com.microsoft.QLinearMul
QLinearMulcom.microsoft1T: tensor(int8)
Divai.onnx1, 6, 7, 13, 14T: tensor(float)
Sigmoidai.onnx1, 6, 13T: tensor(float)QLinearSigmoidcom.microsoft1
HardSigmoidai.onnx1, 6T: tensor(float)QLinearHardSigmoidspacemit_ops1
HardSwishai.onnx14T: tensor(float)QLinearHardSwishspacemit_ops1
LeakyReluai.onnx1, 6, 16T: tensor(float)QLinearLeakyRelucom.microsoft1
Transposeai.onnx1, 13T: tensor(int8)tensor(uint8)
Castai.onnx1, 6, 9, 13, 19T1: tensor(float)tensor(float16)\nT2: tensor(float)tensor(float16)https://onnx.ai/onnx/operators/onnx__Cast.html
ReduceMeanai.onnx11, 13axes: limited to [2, 3]T: tensor(float)QLinearReduceMeancom.microsoft
QLinearGeluspacemit_ops1T: tensor(int8)
QLinearLayerNormalizationspacemit_ops1T: tensor(int8)
LayerNormalizationai.onnx\ncom.microsoft17\n1T: tensor(float)
Gelucom.microsoft1T: tensor(float)

2.2.5 Inference Sample

To make it easier for users to get started, we provide corresponding inference samples. You can see it in the SDK package, in the path spacemit-ort/samples.

3 Frequently Asked Questions (FAQ)

Everyone is welcome to ask questions

3.1 How to view the profiling information of the model inference?

You can refer to the original instructions. https://onnxruntime.ai/docs/performance/tune-performance/profiling-tools.html

#include <onnxruntime_cxx_api.h> 
#include "spacemit_ort_env.h"
Ort::Env env(ORT_LOGGING_LEVEL_WARNING, "ort-demo");
Ort::SessionOptions session_options;
std::unordered_map<std::string, std::string> provider_options;
std::string profile_path = "ort-demo-profile";
// Enable profiling
session_options.EnableProfiling(profile_path.c_str());
std::string opt_net_path = "ort-demo-opt.onnx";
// Enable saving the optimized ONNX model, which can only be used on the current specific platform
session_options.SetOptimizedModelFilePath(opt_net_path.c_str());
SessionOptionsSpaceMITEnvInit(session_options, provider_options);
Ort::Session session(env, net_param_path, session_options);

3.2 How to save the layer-by-layer results during model running?

The dump function of the ONNX model node output Tensor is controlled by a series of environment variables. Here are the explanations of the commonly used environment variables that may be used.

Environment Variable NameMeaningValue
ORT_DEBUG_NODE_IO_DUMP_SHAPE_DATAPrint the Shape information of the Tensor at the input and output of the node0, 1, default is 0
ORT_DEBUG_NODE_IO_DUMP_NODE_PLACEMENTPrint the EP information of the node0, 1, default is 0
ORT_DEBUG_NODE_IO_DUMP_INPUT_DATADump the data of the input Tensor of the node0, 1, default is 0
ORT_DEBUG_NODE_IO_DUMP_OUTPUT_DATADump the data of the output Tensor of the node0, 1, default is 0
ORT_DEBUG_NODE_IO_NAME_FILTERFilter the name of the dump nodeA string separated by semicolons, default is empty
ORT_DEBUG_NODE_IO_OP_TYPE_FILTERFilter the type of the dump nodeA string separated by semicolons, default is empty
ORT_DEBUG_NODE_IO_DUMP_DATA_DESTINATIONExport type of the input and output Tensor of the dump nodeThe string "stdout" or "files" or "sqlite", generally choose files
ORT_DEBUG_NODE_IO_OUTPUT_DIRFile storage location of the input and output Tensor of the dump nodeString
ORT_DEBUG_NODE_IO_DUMPING_DATA_TO_FILES_FOR_ALL_NODES_IS_OKConfirm whether to export all Tensors0, 1, default is 0
export ORT_DEBUG_NODE_IO_DUMP_SHAPE_DATA=1
export ORT_DEBUG_NODE_IO_DUMP_OUTPUT_DATA=1
export ORT_DEBUG_NODE_IO_DUMP_DATA_DESTINATION=files
# Specify the directory to export the Tensor file
export ORT_DEBUG_NODE_IO_OUTPUT_DIR=./dump_dir
export ORT_DEBUG_NODE_IO_DUMPING_DATA_TO_FILES_FOR_ALL_NODES_IS_OK=1
export ORT_DEBUG_NODE_IO_DUMP_NODE_PLACEMENT=1
export ORT_DEBUG_NODE_IO_APPEND_RANK_TO_FILE_NAME=1
# export ORT_DEBUG_NODE_IO_OP_TYPE_FILTER="QLinearConv;QLinearGlobalAveragePool"
rm -rf./dump_dir
mkdir -p./dump_dir
# Execute the demo or your program to obtain
./run_demo resnet18 resnet18.q.onnx

Console output

QLinearConv node: SpaceMITExecutionProvider_QLinearConv_20
Input 0 Name: PPQ_Operation_141
Shape: {1,7,7,512}
Input 1 Name: ortshared_1_0_1_2_token_254
Shape: {}
Input 2 Name: PPQ_Variable_373
Shape: {}
Input 3 Name: onnx::Conv_250
Shape: {512,512,3,3}
Input 4 Name: PPQ_Variable_375
Shape: {512}
Input 5 Name: PPQ_Variable_376
Shape: {512}
Input 6 Name: ortshared_1_0_1_3_token_255
Shape: {}
Input 7 Name: PPQ_Variable_382
Shape: {}
Input 8 Name: onnx::Conv_251
was missing data type
Placement: SpaceMITExecutionProvider
-----------
Output 0 Name: PPQ_Operation_145
Shape: {1,7,7,512}
Placement: SpaceMITExecutionProvider
-----------
QLinearGlobalAveragePool node: SpaceMITExecutionProvider_QLinearGlobalAveragePool_21
Input 0 Name: PPQ_Operation_147
Shape: {1,7,7,512}
Input 1 Name: ortshared_1_0_1_0_token_252
Shape: {}
Input 2 Name: PPQ_Variable_391
Shape: {}
Input 3 Name: ortshared_1_0_1_1_token_253
Shape: {}
Input 4 Name: PPQ_Variable_394
Shape: {}
Placement: SpaceMITExecutionProvider
-----------
Output 0 Name: PPQ_Operation_149
Shape: {1,1,1,512}

Obtain all the outputs of the specified node type in the ./dump_dir directory, stored in the tensorproto format

3.3 How to set multi-threading and thread affinity?

You can refer to the original document to set the thread affinity. Due to the particularity of the architecture, threads 0 - 3 cannot be manually set for affinity and are managed by the ep itself.

https://onnxruntime.ai/docs/performance/tune-performance/threading.html#set-intra-op-thread-affinity

3.4 Do you need to pay attention to the Layout memory arrangement of the Tensor?

The inference library completely follows the definition of the Tensor by ONNXRuntime, that is, the memory layout of NCHW is consistent with the shape description.

3.5 Models inputting to the QLinear operator

There are some official QLinear operators in the ONNX operator set, which can be used directly in the case of static shape, and in other cases, try to use the quantized model in the QDQ format