The Sched app allows you to build your schedule but is not a substitute for your event registration. You must be registered for KubeCon + CloudNativeCon Japan 2025 to participate in the sessions. If you have not registered but would like to join us, please go to the event registration page to purchase a registration.
Please note: This schedule is automatically displayed in Japan Standard Time (UTC+9:00). To see the schedule in your preferred timezone, please select from the drop-down menu to the right, above "Filter by Date." The schedule is subject to change and session seating is available on a first-come, first-served basis.
Sign up or log in to add sessions to your schedule and sync them to your phone or calendar.
Our hero, a running app in a K8s prod environment, knows they are destined for greater things! They’re serving end users, but the value of the cloud is not realized. Hero’s devs toil on custom integrations, deployment is brittle and slow, and security and governance are HARD. Hero longs for a developer platform with consistent and repeatable system building blocks.
It is up to you, the audience, to guide our hero’s transformation from a lost and confused app to one built on a solid foundation that abstracts away complexity and promotes innovation. In their fifth KubeCon ‘Choose Your Own Adventure’-style talk, Whitney and Viktor will present choices that an anthropomorphized app must make as they build an Internal Developer Platform, enabling the devs to have self-service access to widely used system capabilities. Throughout the presentation, the audience (YOU!) will vote to decide our hero's path! Can we navigate CNCF projects and build a platform before the session time elapses?
Viktor Farcic is a lead rapscallion at Upbound, a member of the CNCF Ambassadors, Google Developer Experts, CDF Ambassadors, and GitHub Stars groups, and a published author. He is a host of the YouTube channel DevOps Toolkit and a co-host of DevOps Paradox.
Whitney is a CNCF Ambassador who is passionate about cloud native tools. Creative and driven, she has created and delivered two KubeCon keynotes, a VMware Explore keynote, and countless fun, funny, and informative community conference keynotes. You can catch her lightboard show... Read More →
Kubernetes cluster upgrades are frequent and can lead to unforeseen problems, including application downtime. Therefore, it is essential to understand the roles of each component that makes up a Kubernetes cluster and how they behave during an upgrade in order to safely upgrade the Kubernetes cluster and achieve zero downtime for your applications.
This session will explain the basics and practical practices to improve safety, and ensure zero downtime for your applications. In 30 minutes, we will provide an easy-to-understand explanation of everything from the roles of the main Kubernetes components to points to note during upgrades and recommended configurations to minimize or eliminate application downtime during the upgrade process. This session is recommended for those who are going to operate Kubernetes in earnest and those who are troubled by upgrades and want to learn how to prevent application downtime during Kubernetes cluster upgrades.
Kazuki Uchima is a Technical Solutions Engineer at Google Cloud, specializing in Cloud Native technologies with a focus on Kubernetes. He provides consulting, architecture design, and technical support for Cloud Native solutions.
Kakeru Ishii is a Technical Solutions Engineer in Japan who helps Google Cloud customers especially when they have troubles with Kubernetes cluster, or applications on it. Background of his technical skills are around computer graphics, but now his enthusiasm is towards infrastructure... Read More →
As Generative AI adoption increases, organizations face accelerating challenges in deploying, scaling, and managing access to diverse AI models across cloud and on-prem environments. Envoy AI Gateway utilizes Envoy Proxy’s powerful filter architecture and extensibility through ext-proc to deliver key features such as centralized credential management, intelligent model routing, and LLM token usage control. As the first CNCF-backed open source AI gateway, Envoy AI Gateway is built on top of a robust, high performance Envoy Gateway to help democratize AI infrastructure for organizations of all sizes.
In this talk, we will dive into the architecture of Envoy AI Gateway to learn how it extends Envoy’s capabilities to efficiently manage AI-driven workloads for enterprise needs, while providing robustness, scalability, and adaptability in the rapidly-changing generative AI landscape. We will also showcase a demo of an AI agent seamlessly accessing models anywhere through a unified API.
Dan Sun is a software engineer team lead at Bloomberg. He is the co-founder and maintainer of KServe, an open source Serverless AI inference platform project. He is a co-founder of the Envoy AI Gateway project.
Takeshi Yoneda is a software engineer at Tetrate.io, with contributions to numerous open source projects, including compilers and network proxies. He is a co-founder of the Envoy AI Gateway project as well as maintainer of Envoy Proxy project.
Cold-start delays for GPU-heavy GenAI apps like ComfyUI aren’t just about speed—they’re architectural failures. While others optimize incremental steps, we eliminate entire phases: no image downloads, no layer extraction, no redundant model copies.
We introduce a radical Kubernetes-native pattern: Direct-to-GPU streaming via FUSE-mounted object storage (S3/GCS), bypassing legacy container workflows. By rearchitecting the snapshotter to support seekable, on-demand FUSE streaming, we enable:
- Instant container boot: Models/CUDA dependencies mount directly from object storage, avoiding registry bottlenecks (40MB/s → 900MB/s throughput) - Zero-extraction overhead: Layers load incrementally via range-optimized fetches, eliminating Zstd unpack/copy latency - True cold start elimination: ComfyUI pods activate in 90s (vs. 8+ mins) by co-locating model mounting and inference prep
We’ll dissect a live ComfyUI deployment using 100% OSS primitives to hack container internals in the session.
Fog Dong, a Senior Engineer at BentoML, KubeVela maintainer, CNCF Ambassador, and LFAPAC Evangelist, has a rich background in cloud native and AI infra. Previously instrumental in developing Alibaba's large-scale Serverless workflows and Bytedance's cloud-native CI/CD platform, she... Read More →
Everyone is talking about platform engineering. You see smooth demos of golden paths and self-service platforms. However, there’s a significant area of challenges that is less talked about and thus often neglected when designing developer platforms.
In this talk, we’ll explore the often-overlooked day 2 challenges that platform teams face. We’ll dissect the area of day 2 into the many sub-areas and challenges they pose. Drawing on real-world experiences, including notable migrations that many in this community have faced, we'll shed light on the pain behind developer platforms and discuss solutions to these issues. Among others, we’ll delve into practical strategies for managing versioning and rollouts, and highlight the significant hurdles encountered, such as dependencies on end user teams or GitOps.
Join us for insights, strategies, and stories from the trenches that will help you navigate the complexities of service iteration in developer platforms.
Puja Abbassi is the Vice President of Product at Giant Swarm, building a managed cloud native developer platform based on Kubernetes. In Kubernetes he focuses on extending Kubernetes with custom resources and controllers. With many years of Kubernetes experience and having been in... Read More →
Kubernetes is widely recognized as a platform for building platforms, but even with modern platform engineering techniques, managing an end-to-end release and deployment lifecycle remains challenging.
This talk will first analyze key pain points in platform engineering and propose a new perspective on infrastructure code by separating different personas’ views. We advocate for a tenant-centric API that prioritizes user experience—minimizing input, reducing learning curves, and abstracting cloud provider details to decouple desired resources from underlying specs.
Next, we’ll introduce a design for fanning out tenant resource claims, enhancing flexibility and extensibility through a code generation component and a Kubernetes-like labeling system. Finally, we’ll cover the glue that binds everything together: Pkl for templating and validation, Prow for GitHub event-driven automation, and Crossplane + ArgoCD as the claim realization engine.
Wei Huang is a Software Engineer at Apple, focusing on Kube scheduling and control plane. He has served as a co-chair of Kubernetes SIG-Scheduling for years. He is also the founder of two Kubernetes sub-projects, scheduler-plugins, and kwok.
As organizations scale their Kubernetes adoption, multi-cluster architectures are becoming the backbone of resilience, scalability, and compliance. However, building a unified developer experience across these clusters while abstracting operational complexities is a significant challenge. In this session we’ll demonstrate how Cluster-API (CAPI), a declarative tool for Kubernetes lifecycle management and Linkerd, the powerful yet lightweight service mesh, can work together to simplify multi-cluster topologies for Internal Developer Platforms (IDP). By combining CAPI's robust cluster management with Linkerd’s seamless cross-cluster service communication, platform teams can deliver a streamlined and intuitive experience for developers, enabling them to focus on building and deploying applications without worrying about underlying infrastructure.
William is a CNCF and Linkerd Ambassador, working at Mirantis as a Consulting Architect. He’s focused in helping customers designing, building, and running Developer Platform and Edge systems. He wore many hats, Engineering, Product Owner and Consulting. from HPC, Storage to Distributed... Read More →
You probably have more than one cluster and there is a decent chance you are using Argo CD. Additionally, it is quite likely that you have a few other variations of Kubernetes cluster lists. We posit that writing glue code to stitch together these clusters lists is not an awesome use of your time. Thankfully the good folks in SIG-Multicluster built this super cool api for cluster lists, cluster profile/cluster inventory! We are going to show you how to use said fancy new list with Argo CD along with other multi-cluster tools across Kubernetes clusters hosted by different providers. There will be demos. Possibly Mustaches. And a decent amount of awful puns. So come on down to bear witness to some sweet multi-cluster abstractions that will surely get your heart rate up.
Nick is currently the product manager for GKE Fleets & Teams focusing on multi-cluster capabilities that streamline GCP customers experience while building platforms on GKE. He also is a Kubernetes contributor, participates in SIG-Multicluster, and has been part of the community since... Read More →
Platform Engineering enables developers to focus on business value-aligned tasks by providing internal developer platforms (IDPs) that automate non-essential tasks. Kubernetes is widely used as a foundation for IDPs thanks to its scalability and flexibility.
However, Kubernetes was designed as a general workload orchestrator, not a platform component. As a result, IDP builders must integrate additional Cloud Native technologies and customizations, which can create scalability bottlenecks. At LY Corporation, his team has developed a Kubernetes-based, multi-tenant IDP running over 140K pods, and they faced such scalability challenges.
In this session, he will discuss scalability bottlenecks faced in the IDP, including observability pipelines, access control, etc. He will also explore scaling strategies for IDPs and how they address real-world scalability issues. By the end of this session, you will gain deeper insights into scalability challenges from a platform builder’s perspective.
Hiroshi is a lead engineer for Kubernetes-based application platforms in LY Corporation's Private Cloud Division. The company operates numerous large-scale applications on its Kubernetes-based platform, and he excels in ensuring stable operations at scale on Kubernetes and driving... Read More →
How can real-time event streaming platforms, handling millions of events and complex data processing, maintain peak performance and reliability? Managing the same has previously been complex. The latest agent changes and addition of semantic convention in OpenTelemetry make it ideal to monitor highly distributed event streaming architectures (EDA) like Kafka. In this session we will discuss how these changes help standardize telemetry, explain the usage of span links for capturing several traces for a transaction in EDA.
The talk will also cover how Otel enables automatic anomaly detection particularly useful for identifying issues like Consumer Lag, Increased Latency in Event Processing, and Partition Failures. By leveraging context propagation, Otel tracks end-to-end latency across the entire Kafka ecosystem, including producers, brokers, and consumers.
The talk covers real-world examples from gaming platforms and data systems which have enabled Otel for Kafka monitoring.
Shivay Lamba is a software developer specializing in DevOps, Machine Learning and Full Stack Development. He is an Open Source Enthusiast and has been part of various programs like Google Code In and Google Summer of Code as a Mentor and is currently a MLH Fellow. He has also worked... Read More →
Siddharth Vijay, AVP at Pokerbaazi and KubeCon India speaker, brings over 12 years of experience driving impactful projects in AI, Security, and Cloud. A firm advocate of open-source technologies, he has a proven track record of delivering practical solutions with real-world value... Read More →
OpenTelemetry has become the go-to framework for unifying observability signals across metrics, logs, and traces. However, implementing OpenTelemetry often comes with its own set of challenges: broken instrumentation, missing signals, and misaligned semantic conventions that undermine its effectiveness. Debugging these issues can be daunting, leaving teams stuck with incomplete or unreliable observability data.
In this session, Kasper will demystify the debugging process for OpenTelemetry. Attendees will learn how to identify and troubleshoot common issues, ensure signals are transferred correctly, and align instrumentation with semantic conventions for consistent insights. Through live demos, Kasper will showcase techniques for validating resource configurations, debugging signal pipelines, and building confidence in your observability setup. This session is designed for anyone looking to unlock the full potential of OpenTelemetry and create robust observability practices.
Kasper is a Co-Chair for KubeCon+CloudNativeCon EU/NA, Kubestronaut, CNCF Ambassador, and co-founder of the Nordic meetup alliances, Cloud Native Nordics, where he also serves as Community Lead. He works in Developer Relations at Dash0, previously Lunar where he and his team built... Read More →
As AI/ML, high-performance and telecom workloads are progressing in their cloud-native journey, the unique platform requirements inherent to the nature of their functionality are exposing the limitations of existing solutions such as Multus and device plugins. Device Resource Allocation (DRA) offers a fresh approach overcoming these challenges with better resource management for non-homogeneous platforms, topology-aware use cases and beyond! By leveraging the latest Kubernetes features, DRA Drivers are redefining the network interface configuration and enhancing capabilities for multi-network deployments. This talk explores the evolving cloud-native networking landscape and the trade-offs between extending Kubernetes and leveraging add-on components. We will delve into recent advancements including the network device status with KEP-4817, the virtual device allocation with KEP-5075 and the role of the CNI-DRA-Driver in shaping the future of cloud-native networking infrastructure.
Lionel Jouin is a Software Engineer at Ericsson Software Technology, based in Stockholm, Sweden. He actively contributes to Kubernetes with a focus on bringing native support for secondary networks and its ecosystem including services and policies…. His contributions span SIG Network... Read More →
Sunyanan Choochotkaew is a staff research scientist at IBM Research, specializing in distributed computing and performance acceleration on cloud platforms. She holds the role of maintainer of Kepler. She has made contributions to Environmental Sustainability TAG, operator framework... Read More →
In this session, the speaker presents TopoLVM, a CSI plugin for local storage, and introduces an upcoming Kubernetes feature for local storage that he and his team are working on.
Local storage is promising for applications that require high I/O performance, like Elasticsearch and MySQL. TopoLVM provides many features like raw block volumes, resizing, and dynamic provisioning to manage local storage in Kubernetes easily. It also includes a capacity-aware pod scheduling feature that considers each node's local storage capacity.
Currently, this capacity-aware feature is achieved by a scheduler extender, which has two main issues:
1. Many admins don't have the right to install scheduler extenders. 2. The scheduler extender is TopoLVM specific.
To address these issues, he will introduce a KEP titled "KEP-4049: Storage Capacity Scoring of Nodes for Dynamic Provisioning." to be able to TopoLVM's scheduling logic for all CSI drivers without using scheduler extenders.
He works at Cybozu, Inc. and spent four years involved in the operation and development of a server infrastructure using a custom system with VMs. For the past three years, he has focused on the operation and development of the storage area for a new infrastructure using Kubernetes... Read More →