The Sched app allows you to build your schedule but is not a substitute for your event registration. You must be registered for KubeCon + CloudNativeCon Japan 2025 to participate in the sessions. If you have not registered but would like to join us, please go to the event registration page to purchase a registration.
Please note: This schedule is automatically displayed in Japan Standard Time (UTC+9:00). To see the schedule in your preferred timezone, please select from the drop-down menu to the right, above "Filter by Date." The schedule is subject to change and session seating is available on a first-come, first-served basis.
Sign up or log in to add sessions to your schedule and sync them to your phone or calendar.
In this presentation, we will unveil the secrets and journey to enable an IBM's AI Accelerator in Kubernetes. We utilize wide range of tools and frameworks such as device plugin, custom scheduler, metrics exporter, webhooks and custom resources together to satisfy real requirements from various stakeholders - general users, cluster administrators and driver/runtime developers. Our device plugin and custom scheduler can accept special preference such as topology-aware allocation to enable RDMA, and webhook-based validator guides them to follow specification changes. To sync up allocation status among the components, we carefully defined a custom resource after performance estimation. From developer perspective, we provide various debug-purpose capabilities: for example, device allocation by PCI address for inspection, and pseudo device mode to achieve non-real-device test. In addition, multi-architecture support gives freedom of platform choice to all of the participants.
Takuya Mishina is a Staff Research Scientist at IBM Research - Tokyo. He has been working for enhancing cloud infrastructure lifecycle management such as security and compliance posture management. Recent interests include extending the automation mechanism to provide usable AI hardware... Read More →
Tatsuhiro Chiba is a STSM and Manager at IBM Research, specialized in performance optimization and acceleration of large scale AI and HPC workloads on Hybrid Cloud. He is leading a project to enhance OpenShift performance and sustainability for AI and HPC by exploiting various cloud... Read More →