eBPF Fellowship Update: Tutorials, Research, and Expanding eBPF into GPU and AI

By Yusheng Zheng

In October 2025, I was honored to join the inaugural cohort of the eBPF Foundation Community & Advocacy Fellows.

As a PhD student at UC Santa Cruz and maintainer of the open source project eunomia-bpf, I’ve spent the past six months enhancing the eBPF ecosystem through educational resources, technical articles, and community engagement.

Traditionally, eBPF has been well-known for networking, tracing, and security. Recently, its potential applications have broadened significantly to areas such as GPU observability and AI infrastructure. Despite this rapid expansion, educational resources, tools, and community discussions in these newer domains remain relatively underdeveloped. During my fellowship, I aimed to bridge this gap by creating accessible tutorials, writing detailed technical blogs, and fostering interactive community spaces.

Developing Tutorials

A core component of my fellowship has been enhancing the bpf-developer-tutorial. This comprehensive, practical guide introduces developers to CO-RE eBPF step by step. Throughout my fellowship, we added nine new tutorials covering recent Linux kernel features such as BPF Arena, Workqueues, struct_ops, and dynptr. Additionally, we introduced content on accelerator monitoring topics including GPU flamegraph profiling, GPU driver tracing for Intel, AMD, and Nouveau GPUs, and tracing Intel NPUs. Practical tutorials on system-level applications such as HID-BPF and cgroup policy guards were also included. Beyond new content, I also revised and improved documentation across more than 30 existing tutorials, merged seven community pull requests, and set up automated documentation generation along with a Rust-based continuous integration workflow.

Exploring New Directions

While tutorials provide hands-on learning, understanding how eBPF applies to emerging domains requires deeper investigation. During the fellowship, I explored this through both research projects and complementary blog posts on the eunomia.dev blog, focusing on two areas: GPU systems and AI agents.

In GPU systems, my research examined how eBPF can extend beyond CPU-centric use cases into accelerator software stacks. gpu_ext models GPU drivers as programmable OS subsystems using eBPF, enabling dynamic policy control and achieving up to 4.8x throughput improvement across inference and training workloads. NCCLbpf applies eBPF to GPU collective communication, providing verified and composable policy execution within NCCL. SysOM-AI, deployed at Alibaba across more than 80,000 GPUs, uses cross-layer tracing built on eBPF to reduce AI training diagnosis time from days to minutes.

To complement these research efforts, I wrote a series of blog posts that examine the practical challenges behind these systems, including the GPU observability gap, a detailed analysis of NVIDIA’s open GPU kernel modules, and a walkthrough of iaprof for AI and GPU flame graph profiling. These posts provide concrete context for why extending eBPF into GPU environments is both necessary and challenging.

For AI agents, my research focused on using eBPF to address system-level observability, resource management, and safety problems emerging when AI agents are deployed at scale. AgentSight introduces boundary-level tracing using eBPF to monitor agent execution. Following this work, ACRFence identifies a new class of security risks in agent checkpoint and restore workflows, where improper state handling can lead to duplicate transactions or unintended credential reuse. In AgentCgroup, we showed that OS-level overhead accounts for 56 to 74 percent of end-to-end latency in agent workflows, highlighting the need for better system-level resource control for agents. In reverse engineering Claude Code’s SSL traffic with eBPF, I demonstrated how eBPF can be used to analyze encrypted traffic generated by AI coding tools. Our work on Schedcp also explores how LLM agents can autonomously optimize Linux schedulers through sched_ext and eBPF, and was presented at MLforSystems 2025.

Community Discussions

Progress in the ecosystem depends on active discussion and shared exploration. During the fellowship, I contributed to this by engaging with the community through talks and workshops. At the Linux Plumbers Conference 2025 in the eBPF track, I presented bpftime and discussed why a userspace eBPF runtime is needed. I also showed how existing eBPF tools can run on top of this runtime without modification.

I also co-organized the inaugural AgenticOS 2026 workshop at ASPLOS 2026, which focused on operating system design for AI agent workloads, where eBPF was a major topic. At the workshop, I presented AgentCgroup and discussed emerging questions around abstractions, resource control, and observability for agent-based systems. We are currently planning a follow-up workshop at SOSP 2026 to continue these discussions.

Looking Ahead

I am grateful to the eBPF Foundation for supporting this fellowship. It has given me the opportunity to focus on improving the accessibility of eBPF and to explore its role in GPU systems and AI infrastructure. Going forward, I will continue expanding tutorials and technical content, especially in GPU and AI related areas, and work on lowering the barrier for community contributions. I hope the community finds our tutorials and technical articles useful.

Contributions, pull requests, and feedback are always welcome.

eBPF Fellowship Update: Tutorials, Research, and Expanding eBPF into GPU and AI

Next PostStrengthening eBPF Security: Progress on Audit and Runtime Hardening