码途未来(htmltoo.com):chitu - 大语言模型推理框架_深度学习

chitu - 大语言模型推理框架

0 次

ihunter

2025/03

- https://github.com/thu-pacman/chitu

git clone --recursive https://github.com/thu-pacman/chitu && cd chitu

pip install -r requirements-build.txt

pip install -U torch

TORCH_CUDA_ARCH_LIST=8.6 CHITU_SETUP_JOBS=4 MAX_JOBS=4 pip install --no-build-isolation .

## 单 GPU 推理

torchrun --nproc_per_node 8 test/single_req_test.py request.max_new_tokens=64 models=DeepSeek-R1 models.ckpt_dir=/data/DeepSeek-R1 infer.pp_size=1 infer.tp_size=8

## 混合并行 (TP+PP)

torchrun --nnodes 2 --nproc_per_node 8 test/single_req_test.py request.max_new_tokens=64 infer.pp_size=2 infer.tp_size=8 models=DeepSeek-R1 models.ckpt_dir=/data/DeepSeek-R1

## 启动服务

- 在 localhost:21002 启动服务

```

export WORLD_SIZE=8

torchrun --nnodes 1 \

--nproc_per_node 8 \

--master_port=22525 \

example/serve.py \

serve.port=21002 \

infer.stop_with_eos=False \

infer.cache_type=paged \

infer.pp_size=1 \

infer.tp_size=8 \

models=DeepSeek-R1 \

models.ckpt_dir=/data/DeepSeek-R1 \

keep_dtype_in_checkpoint=True \

infer.mla_absorb=absorb-without-precomp \

infer.soft_fp8=True \

infer.do_load=True \

infer.max_reqs=1 \

scheduler.prefill_first.num_tasks=100 \

infer.max_seq_len=4096 \

request.max_new_tokens=100 \

infer.use_cuda_graph=True

```

- 测试服务

```

curl localhost:21002/v1/chat/completions \

-H "Content-Type: application/json" \

-d '{

"messages": [

{

"role": "system",

"content": "You are a helpful assistant."

{

"role": "user",

"content": "What is machine learning?"

}

]

```

## 性能测试

- 使用 benchmark_serving 工具进行全面性能测试

python benchmarks/benchmark_serving.py \

--model "deepseek-r1" \

--iterations 10 \

--seq-len 10 \

--warmup 3 \

--base-url http://localhost:21002

代码：https://github.com/thu-pacman/chitu

标签：大模型推理框架人工智能性能

收藏有帮助没帮助

本文链接地址： https://b.htmltoo.com/project-p47.html

上篇： Argilla - 高效构建高质量 AI 数据集的协作工具
下篇： KTransformers - 推理

导航

AI软件

搜索

会员中心

关于我们