Name: Sailing Multi-host Inference for LLM on Kubernetes - Kay Yan, DaoCloud
Start: 2025-06-16T17:10:00+0900
End: 2025-06-16T17:40:00+0900

16-17 June
Learn More and Register to Attend

The Sched app allows you to build your schedule but is not a substitute for your event registration. You must be registered for KubeCon + CloudNativeCon Japan 2025 to participate in the sessions. If you have not registered but would like to join us, please go to the event registration page to purchase a registration.

Please note: This schedule is automatically displayed in Japan Standard Time (UTC+9:00). To see the schedule in your preferred timezone, please select from the drop-down menu to the right, above "Filter by Date." The schedule is subject to change and session seating is available on a first-come, first-served basis.

Monday June 16, 2025 17:10 - 17:40 JST

Level 1 | Appollon

Inference workloads are becoming increasingly prevalent and vital in Cloud Native world. However, it's not easy, one of the biggest challenges is large foundation model can not fit into a single node, such as llama3.1-405B or DeepSeek R1, which brings out the distributed inference with model parallelism, again, make serving inference workloads more complicated.

LeaderWorkerSet, aka. LWS, is a dedicated multi-host inference project aims to solve this problem, it's a project under the guidance of Kubernetes SIG-Apps and Serving Working Group. It offers a couple of features like dual-template for different types of Pods, fine-gained rolling update strategies, topology managements and all-or-nothing failure handlings.

What's more, vLLM, an inference engine, renowned for its performance and easy-to-use, has gained widespread popularity. In this presentation, we'll show you how to use LWS to deploy distributed inference with vLLM on Kubernetes.

Speakers

Kay Yan

Principal Software Engineer, DaoCloud

Kay Yan is kubespray maintainer, containerd/nerdctl maintainer. He is the Principal Software Engineer in DaoCloud, and develop the DaoCloud Enterprise Kubernetes Platform since 2016.

Monday June 16, 2025 17:10 - 17:40 JST
Level 1 | Appollon

Maintainer Track

Need help? View Support Guides
Event questions? Contact Event Planner

KubeCon + CloudNativeCon Japan 2025

Kay Yan

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!