Model Deployment
1 Performance Testing
[Note] Before formally deploying the AI model, we strongly recommend that you conduct performance testing of the relevant model on the chip side to ensure that the inference performance of the current model meets expectations.
The spacemit-ort/bin/onnxruntime_perf_test tool in the SDK directory supports quickly testing the pure inference performance of the AI algorithm model on the chip side. This tool is compatible with ONNX models, so you can easily use it to evaluate the performance of the original ONNX floating-point model, as well as the converted (and/or quantized) ONNX fixed-point model.
1.2 Usage Instructions
$ onnxruntime_perf_test -h
perf_test [options...] model_path [result_file]
Options:
-m [test_mode]: Specifies the test mode. Value could be 'duration' or 'times'.
Provide 'duration' to run the test for a fix duration, and 'times' to repeated for a certain times.
-M: Disable memory pattern.
-A: Disable memory arena
-c [parallel runs]: Specifies the (max) number of runs to invoke simultaneously. Default:1.
-e [cpu|spacemit]: Specifies the provider 'cpu', 'pacemit'. Default:'cpu'.
-r [repeated_times]: Specifies the repeated times if running in 'times' test mode.Default:1000.
-t [seconds_to_run]: Specifies the seconds to run for 'duration' mode. Default:600.
-p [profile_file]: Specifies the profile name to enable profiling and dump the profile data to the file.
-s: Show statistics result, like P75, P90. If no result_file provided this defaults to on.
-S: Given random seed, to produce the same input data. This defaults to -1(no initialize).
-v: Show verbose information.
-x [intra_op_num_threads]: Sets the number of threads used to parallelize the execution within nodes, A value of 0 means ORT will pick a default. Must >= 0.
-y [inter_op_num_threads]: Sets the number of threads used to parallelize the execution of the graph (across nodes), A value of 0 means ORT will pick a default. Must >= 0.
-f [free_dimension_override]: Specifies a free dimension by name to override to a specific value for performance optimization. Syntax is [dimension_name:override_value]. override_value must > 0
-F [free_dimension_override]: Specifies a free dimension by denotation to override to a specific value for performance optimization. Syntax is [dimension_denotation:override_value]. override_value must > 0
-P: Use parallel executor instead of sequential executor.
-o [optimization level]: Default is 99 (all). Valid values are 0 (disable), 1 (basic), 2 (extended), 99 (all).
Please see onnxruntime_c_api.h (enum GraphOptimizationLevel) for the full list of all optimization levels.
-u [optimized_model_path]: Specify the optimized model path for saving.
-z: Set denormal as zero. When turning on this option reduces latency dramatically, a model may have denormals.
-T [Set intra op thread affinities]: Specify intra op thread affinity string
[Example]: -T 1, 2; 3, 4; 5, 6 or -T 1 - 2; 3 - 4; 5 - 6
Use semicolon to separate configuration between threads.
E.g. 1, 2; 3, 4; 5, 6 specifies affinities for three threads, the first thread will be attached to the first and second logical processor.
The number of affinities must be equal to intra_op_num_threads - 1
-D [Disable thread spinning]: disable spinning entirely for thread owned by onnxruntime intra - op thread pool.
-H: Maximum value to produce the random input data. This defaults to -1(as std::numeric_limits<T>::max() whenever the value given by this option less than value of '-L').
-L: Minimum value to produce the random input data. This defaults to 0.
-R: Count of random generated input test data. This defaults to 1 and must > 0.
-U: Maximum value to produce the random value of free dimensions which are not overriden. This defaults to 1. Specified value must > 0.
-V: Minimum value to produce the random value of free dimensions which are not overriden. This defaults to 1. Specified value must > 0.
-Z [Force thread to stop spinning between runs]: disallow thread from spinning during runs to reduce cpu usage.
-h: help
1.2 Parameter Description
| Parameter | Necessary/Optional | Default Value | Description |
|---|---|---|---|
| -m | Optional | times | Test mode: fixed test duration (s) or test times (Note: The original onnxruntime_perf_test tool defaults to the 'duration' mode) |
| -M | Optional | None | Disable memory pattern |
| -A | Optional | None | Disable memory arena |
| -c | Optional | 1 | Number of parallel inferences (the number of session.run() triggers at the same time) |
| -e | Optional | cpu | Provider(s) for inference, separated by " " when used. The currently available EPs include: {cpu |
| -r | Optional | 1000 | Number of model inference tests in the fixed test times mode (for each session) |
| -t | Optional | 600 | Model inference test time in the fixed test duration mode (for each session), unit: seconds |
| -p | Optional | None | Profiling file path (default: disabled, non-empty: enabled) |
| -s | Optional | ON | Print inference time statistics information (if the result file is not specified, it is enabled by default) |
| -S | Optional | -1 | Random seed (default -1 means no random initialization of test data, 0 random seed, > 0 user-specified random seed) |
| -v | Optional | None | Enable debugging information |
| -x | Optional | 0 | Number of parallel threads within a single operator (default 0, i.e., the internal mechanism of onnxruntime) |
| -y | Optional | 0 | Number of concurrent execution threads for multiple operators (default 0, i.e., the internal mechanism of onnxruntime) |
| -f | Optional | None | Specify the value of the free dimension in the model input by the parameter name (string: see abs_free_dimensions.onnx), format key:value |
| -F | Optional | None | Specify the value of the free dimension in the model input by the notation name (string: see abs_free_dimensions.onnx), format key:value |
| -P | Optional | None | Enable parallel execution mode |
| -o | Optional | 99 | Model optimization level |
| -u | Optional | None | Save path of the optimized model |
| -z | Optional | None | Same as session_options.AddConfigEntry(kOrtSessionOptionsConfigSetDenormalAsZero, "1") |
| -T | Optional | None | Specify the affinity of the threads in the internal thread pool of onnxruntime |
| -D | Optional | None | Completely disable the idling of the threads in the thread pool related to the concurrent computing within the onnxruntime operator |
| -H | Optional | -1 | Maximum value for randomly generated test data (if less than the minimum value, the maximum value of the corresponding data type is used by default) |
| -L | Optional | 0 | Minimum value for randomly generated test data |
| -R | Optional | 1 | Number of groups of randomly generated test data |
| -U | Optional | 1 | Maximum random data value for free dimensions (usually batch size) |
| -V | Optional | 1 | Minimum random data value for free dimensions (usually batch size) |
| -Z | Optional | None | Forbid the thread from idling during runs to reduce CPU utilization |
| -h, --help | Optional | None | Print the usage instructions |
1.3 Usage Example
Taking the onnxruntime/test/testdata/abs_free_dimensions.onnx model as an example:
1.3.1. Random Test Data
Fix the test times to 100, randomly generate 10 groups of test data, and fix the random seed to 1, the maximum random test data to 6, and the minimum random test data to 2
$ MODEL = abs_free_dimensions.onnx
$ ARGS = "${MODEL} ${MODEL%.onnx}.txt -m times -r 100 -R 10 -S 1 -H 6 -L 2"
$ onnxruntime_perf_test ${ARGS}
...
Session creation time cost: 0.0455992 s
First inference time cost: 0 ms
Total inference time cost: 0.00371454 s
Total inference requests: 100
Average inference time cost: 0.0371454 ms
Total inference run time: 0.00417042 s
Number of inferences per second: 23978.4
...
2 Application Development
2.1 AI Support Library
2.1.1 Demo Introduction
The current Support Library Demo is located in the bianbu-ai-support directory in the deployment toolkit, and the relevant instructions and examples are as follows:
$ tree -L 3 /opt/spacemit-ai-sdk.v1.1.0/bianbu-ai-support/
/opt/spacemit-ai-sdk.v1.1.0/bianbu-ai-support/
├── bin // Precompiled executable programs
│ ├── classification_demo

