Skip to main content

Large Model Deployment

List of Supported Models

Space-time Iteration currently supports the following large models to run on the Space-time Iteration K1 platform:

Model NameScaleSupported or Not
Qwen1.54B
Qwen20.5B
Qwen21.5B
Lamma38B
Lamma3.18B
tinyllama1.1B
minicpm1B
minicpm2B
phi33.8B
chatglm36B

The release address of large models: https://archive.spacemit.com/spacemit-ai/ModelZoo/llm/

Usage Instructions

Cpp Demo

To run the Cpp demo, you need to use the spacemit-ort toolkit provided by Space-time Iteration. You can refer to the following projects to run the demo.

Python Demo

To run the Python demo, you need to install and use the following Python packages provided by Space-time Iteration:

spacemit-ort
onnxruntime-genai

You can refer to the following files to run the demo.

The release address of Spacemit-ort: https://archive.spacemit.com/spacemit-ai/onnxruntime/spacemit-ort.riscv64.1.2.2.tar.gz

Note: Both the demo and the pip whl are in the released compressed package.

Model Construction (If Needed)

If you want to conduct model conversion by yourself, you can use the model conversion tools provided by Space-time Iteration to convert the large models provided on HuggingFace or ModelScope into the supported model formats, so as to achieve the optimal adaptation effect.

python builder.py
-i huggingface_model_path / modelscope_model_path // Input model address
-o output_model_path // Output model address
-e cpu
-p int4
-c model_cache // Model cache storage address
--extra_options int4_accuracy_level=4 int4_block_size=64 _# use_spacemit_ep=1 # Optionally open _

Performance Data of Large Models

On the K1 chip side, based on spacemit-ort 1.2.2:

ModelScaleFirst Word Latency / S (prompt = 64t)Performance Data / TPS (context = 1024, prompt = 64)
qwen20.5B1.7512.52
qwen21.5B7.7475.38
qwen2.50.5B1.83@67t13.62
qwen2.51.5B5.425.38
qwen2.53B12.01@69t2.85
qwen2.57B31.251.39
phi33.8B15.922.14
tinyllama1.1B7.84@95t7.38
llama38B36.26@69t1.18
llama3.21B4.287.18
llama3.23B13.142.6
minicpm-1b1B5.28@68t5.14
minicpm-2b2B14.34@67t2.79
minicpm34B20.17s@65t0.92
chatglm36B25.61@58t1.66579
gemma22B24.583.39