模型部署
1 性能测试
**【提示】**在正式部署 AI 模型前,我们强烈建议您先在 芯片端
进行相关模型的性能测试,以确保当前模型的推理性 能符合预期。
SDK 目录下的 spacemit-ort/bin/ onnxruntime_perf_test
工具支持在 芯片端
快速测试 AI 算法模型的纯推理性能。该工具兼容 ONNX 模型,故您可以很方便的使用它来评测原始 ONNX 浮点模型,以及转换(和/或量化)后的 ONNX 定点模型性能。
1.2 使用说明
$ onnxruntime_perf_test - h
perf_test [options...] model_path [result_file]
Options:
- m [test_mode]: Specifies the test mode. Value could be 'duration' or 'times'.
Provide 'duration' to run the test for a fix duration, and 'times' to repeated for a certain times.
- M: Disable memory pattern.
- A: Disable memory arena
- c [parallel runs]: Specifies the (max) number of runs to invoke simultaneously. Default:1.
- e [cpu|spacemit]: Specifies the provider 'cpu', 'pacemit'. Default:'cpu'.
- r [repeated_times]: Specifies the repeated times if running in 'times' test mode.Default:1000.
- t [seconds_to_run]: Specifies the seconds to run for 'duration' mode. Default:600.
- p [profile_file]: Specifies the profile name to enable profiling and dump the profile data to the file.
- s: Show statistics result, like P75, P90. If no result_file provided this defaults to on.
- S: Given random seed, to produce the same input data. This defaults to - 1(no initialize).
- v: Show verbose information.
- x [intra_op_num_threads]: Sets the number of threads used to parallelize the execution within nodes, A value of 0 means ORT will pick a default. Must >= 0.
- y [inter_op_num_threads]: Sets the number of threads used to parallelize the execution of the graph (across nodes), A value of 0 means ORT will pick a default. Must >= 0.
- f [free_dimension_override]: Specifies a free dimension by name to override to a specific value for performance optimization. Syntax is [dimension_name:override_value]. override_value must > 0
- F [free_dimension_override]: Specifies a free dimension by denotation to override to a specific value for performance optimization. Syntax is [dimension_denotation:override_value]. override_value must > 0
- P: Use parallel executor instead of sequential executor.
- o [optimization level]: Default is 99 (all). Valid values are 0 (disable), 1 (basic), 2 (extended), 99 (all).
Please see onnxruntime_c_api.h (enum GraphOptimizationLevel) for the full list of all optimization levels.
- u [optimized_model_path]: Specify the optimized model path for saving.
- z: Set denormal as zero. When turning on this option reduces latency dramatically, a model may have denormals.
- T [Set intra op thread affinities]: Specify intra op thread affinity string
[Example]: - T 1, 2; 3, 4; 5, 6 or - T 1 - 2; 3 - 4; 5 - 6
Use semicolon to separate configuration between threads.
E.g. 1, 2; 3, 4; 5, 6 specifies affinities for three threads, the first thread will be attached to the first and second logical processor.
The number of affinities must be equal to intra_op_num_threads - 1
- D [Disable thread spinning]: disable spinning entirely for thread owned by onnxruntime intra - op thread pool.
- H: Maximum value to produce the random input data. This defaults to - 1(as std::numeric_limits<T>::max() whenever the value given by this option less than value of '- L').
- L: Minimum value to produce the random input data. This defaults to 0.
- R: Count of random generated input test data. This defaults to 1 and must > 0.
- U: Maximum value to produce the random value of free dimensions which are not overriden. This defaults to 1. Specified value must > 0.
- V: Minimum value to produce the random value of free dimensions which are not overriden. This defaults to 1. Specified value must > 0.
- Z [Force thread to stop spinning between runs]: disallow thread from spinning during runs to reduce cpu usage.
- h: help
1.2 参数说明
参数 | 必要/可选 | 默认值 | 说明 |
---|---|---|---|
- m | 可选 | times | 测试模式:固定测试时长(s)或测试次数(注:原 onnxruntime_perf_test 工具默认 'duration' 模式) |
- M | 可选 | 无 | 禁用内存 pattern |
- A | 可选 | 无 | 禁用内存 arena |
- c | 可选 | 1 | 压测并行推理数量(同一时刻,触发 session.run() 的个数) |
- e | 可选 | cpu | 推理运行的 Provider(s),使用时用 " " 隔开,当前可选 EP(s) 包括:{cpu |
- r | 可选 | 1000 | 固定测试次数模式下的模型推理测试次数(每个 session) |
- t | 可选 | 600 | 固定测试时间模式下的模型推理测试时间(每个 session),单位:秒 |
- p | 可选 | 无 | Profiling 文件路径(默认:禁用,非空:使能) |
- s | 可选 | ON | 打印推理耗时统计信息(如果未指定结果文件,则默认开启) |
- S | 可选 | - 1 | 随机种子(默认 - 1 即不随机初始化测试数据,0 随机随机种子, > 0 用户指定的随机种子) |
- v | 可选 | 无 | 使能调试信息 |
- x | 可选 | 0 | 单个算子内部并行线程数(默认 0 即 onnxruntime 内部机制) |
- y | 可选 | 0 | 多个算子并发执行线程数(默认 0 即 onnxruntime 内部机制) |
- f | 可选 | 无 | 按 参数名称(字符串:参见 abs_free_dimensions.onnx) 指定模型输入中自由维度的数值,格式 key:value |
- F | 可选 | 无 | 按 标记名称(字符串:参见 abs_free_dimensions.onnx) 指定模型输入中自由维度的数值,格式 key:value |
- P | 可选 | 无 | 使能并行执行模式 |
- o | 可选 | 99 | 模型优化等级 |
- u | 可选 | 无 | 优化后模型的保存路径 |
- z | 可选 | 无 | 同 session_options.AddConfigEntry(kOrtSessionOptionsConfigSetDenormalAsZero, "1") |
- T | 可选 | None | 指定 onnxruntime 内部线程池中线程的亲和性 |
- D | 可选 | 无 | 完全禁用 onnxruntime 算子内并发计算相关线程池中线程的空转 |
- H | 可选 | - 1 | 随机生成测试数据最大值(如果小于最小值,则默认使用相应数据类型的最大值) |
- L | 可选 | 0 | 随机生成测试数据最小值 |
- R | 可选 | 1 | 随机生成测试数据(组)数 |
- U | 可选 | 1 | 自由维度(一般是 batch size)随机数据最大值 |
- V | 可选 | 1 | 自由维度(一般是 batch size)随机数据最小值 |
- Z | 可选 | 无 | 禁止线程在运行期间空转,以减少 CPU 利用率 |
- h, -- help | 可选 | 无 | 打印使用说明 |
1.3 使用示例
以 onnxruntime / test / testdata / abs_free_dimensions.onnx模型为例:
1.3.1. 随机测试数据
固定测试次数 100,随机生成 10 组测试数据,并且固定随机种子为 1、随机测试数据最大 6、随机测试数据最小 2
$ MODEL = abs_free_dimensions.onnx
$ ARGS = "${MODEL} ${MODEL%.onnx}.txt - m times - r 100 - R 10 - S 1 - H 6 - L 2"
$ onnxruntime_perf_test ${ARGS}
...
Session creation time cost: 0.0455992 s
First inference time cost: 0 ms
Total inference time cost: 0.00371454 s
Total inference requests: 100
Average inference time cost: 0.0371454 ms
Total inference run time: 0.00417042 s
Number of inferences per second: 23978.4
...