Loading…
16-17 June
Learn More and Register to Attend

The Sched app allows you to build your schedule but is not a substitute for your event registration. You must be registered for KubeCon + CloudNativeCon Japan 2025 to participate in the sessions. If you have not registered but would like to join us, please go to the event registration page to purchase a registration.

Please note: This schedule is automatically displayed in Japan Standard Time (UTC+9:00)To see the schedule in your preferred timezone, please select from the drop-down menu to the right, above "Filter by Date." The schedule is subject to change and session seating is available on a first-come, first-served basis. 
Type: AI + ML clear filter
Monday, June 16
 

11:30 JST

A Journey and Lessons Learned To Enable IBM AIU Accelerators in Kubernetes - Takuya Mishina & Tatsuhiro Chiba, IBM Research
Monday June 16, 2025 11:30 - 12:00 JST
In this presentation, we will unveil the secrets and journey to enable an IBM's AI Accelerator in Kubernetes. We utilize wide range of tools and frameworks such as device plugin, custom scheduler, metrics exporter, webhooks and custom resources together to satisfy real requirements from various stakeholders - general users, cluster administrators and driver/runtime developers. Our device plugin and custom scheduler can accept special preference such as topology-aware allocation to enable RDMA, and webhook-based validator guides them to follow specification changes. To sync up allocation status among the components, we carefully defined a custom resource after performance estimation. From developer perspective, we provide various debug-purpose capabilities: for example, device allocation by PCI address for inspection, and pseudo device mode to achieve non-real-device test. In addition, multi-architecture support gives freedom of platform choice to all of the participants.
Speakers
avatar for Takuya Mishina

Takuya Mishina

Researcher, IBM Research
Takuya Mishina is a Staff Research Scientist at IBM Research - Tokyo. He has been working for enhancing cloud infrastructure lifecycle management such as security and compliance posture management. Recent interests include extending the automation mechanism to provide usable AI hardware... Read More →
avatar for Tatsuhiro Chiba

Tatsuhiro Chiba

Senior Technical Staff Member, IBM Research
Tatsuhiro Chiba is a STSM and Manager at IBM Research, specialized in performance optimization and acceleration of large scale AI and HPC workloads on Hybrid Cloud. He is leading a project to enhance OpenShift performance and sustainability for AI and HPC by exploiting various cloud... Read More →
Monday June 16, 2025 11:30 - 12:00 JST
Level 1 | Pegasus A-B1
  AI + ML

12:10 JST

Beyond Stock Outs: Scaling Inference on Mixed GPU Hardware With DRA - John Belamaric & Bo Fu, Google
Monday June 16, 2025 12:10 - 12:40 JST
Pods are PENDING?!? Ugh, none of the latest GPUs are available. But there are tons of older ones! If only you could tell Kubernetes “use the best GPU available, as long as it has 20GB+”...(enter: DRA).

Kubernetes’ Dynamic Resource Allocation (DRA) system, beta since 1.32, allows variations on which GPUs get allocated to Pods. You can write a flexible spec so that when the Deployment scales, Pods can land on whatever nodes have available GPUs. DRA works even if you need different numbers of devices for different hardware! This enables a new level of utilization and efficiency, saving your organization real money.

Combined with an advanced Node Autoscaler like Google’s Custom Compute Classes or Karpenter, you can spin up more VMs with whatever GPUs are available - or the most economical - all for a single Deployment. Scaling is simpler and more reliable, and your workload can scale even when your preferred type of GPU is stocked out.

Come learn how, and see it in action with a demo!
Speakers
avatar for John Belamaric

John Belamaric

Senior Staff Software Engineer, Google
John is a Sr Staff SWE, co-chair of K8s SIG Architecture and of K8s WG Device Management, helping lead efforts to improve how GPUs, TPUs, NICs and other devices are selected, shared, and configured in Kubernetes. He is also co-founder of Nephio, an LF project for K8s-based automation... Read More →
avatar for Bo Fu

Bo Fu

Senior Product Manager, Google
Monday June 16, 2025 12:10 - 12:40 JST
Level 1 | Pegasus A-B1
  AI + ML

14:10 JST

Access AI Models Anywhere: Scaling AI Traffic With Envoy AI Gateway - Dan Sun, Bloomberg & Takeshi Yoneda, Tetrate.io
Monday June 16, 2025 14:10 - 14:40 JST
As Generative AI adoption increases, organizations face accelerating challenges in deploying, scaling, and managing access to diverse AI models across cloud and on-prem environments. Envoy AI Gateway utilizes Envoy Proxy’s powerful filter architecture and extensibility through ext-proc to deliver key features such as centralized credential management, intelligent model routing, and LLM token usage control.
As the first CNCF-backed open source AI gateway, Envoy AI Gateway is built on top of a robust, high performance Envoy Gateway to help democratize AI infrastructure for organizations of all sizes.

In this talk, we will dive into the architecture of Envoy AI Gateway to learn how it extends Envoy’s capabilities to efficiently manage AI-driven workloads for enterprise needs, while providing robustness, scalability, and adaptability in the rapidly-changing generative AI landscape. We will also showcase a demo of an AI agent seamlessly accessing models anywhere through a unified API.
Speakers
avatar for Dan Sun

Dan Sun

Team Lead, Bloomberg
Dan Sun is a software engineer team lead at Bloomberg. He is the co-founder and maintainer of KServe, an open source Serverless AI inference platform project. He is a co-founder of the Envoy AI Gateway project.
avatar for Takeshi Yoneda

Takeshi Yoneda

Open Source Software Engineer, Tetrate.io
Takeshi Yoneda is a software engineer at Tetrate.io, with contributions to numerous open source projects, including compilers and network proxies. He is a co-founder of the Envoy AI Gateway project as well as maintainer of Envoy Proxy project.
Monday June 16, 2025 14:10 - 14:40 JST
Level 1 | Orion
  AI + ML
  • Content Experience Level Any
  • Presentation Language English

14:50 JST

Zero-Extraction Cold Starts: How FUSE-Streaming Slashed ComfyUI Cold Starts by 10x - Fog Dong, BentoML
Monday June 16, 2025 14:50 - 15:20 JST
Cold-start delays for GPU-heavy GenAI apps like ComfyUI aren’t just about speed—they’re architectural failures. While others optimize incremental steps, we eliminate entire phases: no image downloads, no layer extraction, no redundant model copies.

We introduce a radical Kubernetes-native pattern: Direct-to-GPU streaming via FUSE-mounted object storage (S3/GCS), bypassing legacy container workflows. By rearchitecting the snapshotter to support seekable, on-demand FUSE streaming, we enable:

- Instant container boot: Models/CUDA dependencies mount directly from object storage, avoiding registry bottlenecks (40MB/s → 900MB/s throughput)
- Zero-extraction overhead: Layers load incrementally via range-optimized fetches, eliminating Zstd unpack/copy latency
- True cold start elimination: ComfyUI pods activate in 90s (vs. 8+ mins) by co-locating model mounting and inference prep

We’ll dissect a live ComfyUI deployment using 100% OSS primitives to hack container internals in the session.
Speakers
avatar for Fog Dong

Fog Dong

Senior Software Engineer, BentoML
Fog Dong, a Senior Engineer at BentoML, KubeVela maintainer, CNCF Ambassador, and LFAPAC Evangelist, has a rich background in cloud native and AI infra. Previously instrumental in developing Alibaba's large-scale Serverless workflows and Bytedance's cloud-native CI/CD platform, she... Read More →
Monday June 16, 2025 14:50 - 15:20 JST
Level 1 | Orion
  AI + ML

15:50 JST

Scaling AI Responsibly: Building Ethical, Sustainable, and Cloud Native AI Systems - Amita Sharma & Vincent Caldeira, Red Hat; Mohit Suman, Salesforce; Shamsher Ansari, Platform9; Anusha Hegde, Nirmata
Monday June 16, 2025 15:50 - 16:20 JST
Panel Discussion - As AI continues to reshape industries, organizations face mounting pressure to scale AI systems responsibly while addressing challenges in efficiency, sustainability, and trust. This panel convenes leading experts to discuss how cloud-native technologies and CNCF projects are paving the way for scalable, ethical, and resource-efficient AI. Attendees will gain actionable insights into optimizing AI workflows, reducing environmental impact, and ensuring transparency in AI decision-making. From leveraging open-source tools to implementing cost-effective and ethical AI practices, this session will equip you with the knowledge to build AI systems that are both innovative and responsible. Discover how to harness the power of cloud-native ecosystems to drive AI transformation without compromising on sustainability or trust.
AI/ML engineers and data scientists looking to scale AI systems in cloud-native environments.
Speakers
avatar for Amita Sharma

Amita Sharma

AI/ML Engineering Manager, Red Hat
Amita is an Engineering Manager at Red Hat, leading Kubeflow Training, Feature Store. With 20 years of industry experience, including 14 years at Red Hat, she has held various roles. She is an active open-source contributor. Since 2011, she has contributed to the Fedora Project and... Read More →
avatar for Vincent Caldeira

Vincent Caldeira

CTO APAC, Red Hat
Vincent Caldeira, CTO of Red Hat in APAC, is responsible for strategic partnerships and technology strategy. Named a top CTO in APAC in 2023, he has 20+ years in IT, excelling in technology transformation in finance. An authority in open source and cloud-native technologies, Vincent... Read More →
avatar for Shamsher Ansari

Shamsher Ansari

Technical Product Manager, Platform9
Shamsher Ansari is a Technical Product Manager at Platform9, driving cloud-native infrastructure and Kubernetes solutions. With extensive experience in cloud, edge computing, and open-source technologies, he focuses on delivering scalable and cost-efficient products. Previously, at... Read More →
avatar for Anusha Hegde

Anusha Hegde

Senior Technical Product Manager, Nirmata
Anusha Hegde is a Senior Technical Product Manager at Nirmata, focusing on cloud security, Kubernetes policy management, policy-as-code automation, and building AI-first products while analyzing AI’s impact on her product and customers. Previously, she was a Tech Lead at VMware... Read More →
avatar for Mohit Suman

Mohit Suman

Senior Product Manager, Salesforce
Mohit Suman is a Product Management Leader at Salesforce, driving AI Observability, MLOps, and AI App Dev. With 12+ years in product strategy, engineering, and architecture, he builds scalable solutions for developer productivity. A passionate advocate for open source and public speaking... Read More →
Monday June 16, 2025 15:50 - 16:20 JST
Level 1 | Pegasus A-B1
  AI + ML
 
  • Filter By Date
  • Filter By Venue
  • Filter By Type
  • Content Experience Level
  • Presentation Language
  • Timezone

Share Modal

Share this link via

Or copy link

Filter sessions
Apply filters to sessions.