An exhaustively explained example of eBPF file operation tracking and logging
using lsm hooks.
This is an answer to a given 6-day challenge using eBPF that should be close to the following description:
- A Linux user-mode agent in Rust that captures and collects file access events with eBPF
- Attempts to logs said events in a "structured manner" (e.g. JSON)
- Attempts to detect some sort of "suspicious activity" (e.g. multiple consecutive writes by the same PID)
- Be efficient
- Attempts "anti-tampering" with self-integry check ("periodically check the
binary") and attempts "anti-debugging" measures (detect
ptraceor debugger)
To me, this is an overdue venture into the kernel realm. Unlike what I usually publish, it's an example full of experiments and comments about the impractical and "this is the way to do." This is the absolute minimum in terms of functionality and actual "engineering" since most of the week I was just grasping eBPF (see Resources). Hence the implementation here just tries to satisfy the criteria and not much else.
The proposed "answer" here answers to each bullet above as below:
- Each line of
STDOUTlogs the event as a JSON object, rest of logs are inSTDERRin text format - Detects multiple consecutive writes by the same PID
- Is fairly efficient
- Optionally kills the program on
ptracedetection
See Details for more information.
The prerequisites are the same as stated for a normal
aya project:
- Packages:
llvm - Toolchains: both
stableandnightly - Components:
rust-srconnightly - Binary crates:
bpf-linker
Building, and running are just as in any Cargo crate. Make sure you are priviledged before running the program:
sudo -E cargo runGiven forbid_ptrace feature flag as below, program runs more eBPF code to warn
about ptrace usage which currently disallows ALL the ptarce requests (use
with caution as this potentially may block important operations depending on the
kernel settings).
sudo -E cargo run --features=forbid_ptraceAs of now there are no options and this is basically a guide to shell scripting.
sudo -E cargo run --features=forbid_ptraceTo only see application logs (usage warnings, suspicious activities, errors,
warnings and what it is doing), filter STDOUT (JSON output normal) and leave
STDERR:
sudo -E cargo run >/dev/null
# example output:
# [WARN bpf_file_monitor_challenge] "/home/USER/somefile" is suspiciously active
# [ERROR bpf_file_monitor_challenge] ptrace is forbidden on this build. closing...
# [WARN bpf_file_monitor_challenge] ptrace detected, requesting immediate terminationTo only see JSON values (for example for a UI) and ignore usage warnings and
what not (filter STDERR and capture STDOUT):
sudo -E cargo run 2>/dev/null
# { timestamp: 1757870000.681105, path: "anon_inode:[pidfd]", comm: "systemd", pid: 1, tgid: 1, mode: 67207171 }
# { timestamp: 1757870000.681161, path: "/proc/1/fdinfo/47", comm: "systemd", pid: 1, tgid: 1, mode: 67141633 }
# { timestamp: 1757870000.681223, path: "/proc/617/cgroup", comm: "systemd", pid: 1, tgid: 1, mode: 67141633 }
# { timestamp: 1757870000.681264, path: "/proc/1/fdinfo/47", comm: "systemd", pid: 1, tgid: 1, mode: 67141633 }You can mix and match between these commands and there are guides online on it.
Finally, instead of /dev/null you can give a path to set each channel to a
file for persistent logging. This is not advised as outputs are plenty and
consume a lot of space.
With the exception of eBPF code, bpf-file-monitor-challenge is distributed
under the terms of either the MIT license or the Apache License (version
2.0), at your option.
All eBPF code is distributed under either the terms of the GNU General Public License, Version 2 or the MIT license, at your option.
In this sections notes and else is written. There is no inherit value in knowing or repeating these notes. Regardless, here they are.
As of now, this program does not have configurable settings. Whatever available
to the user is at the start of main.rs. Watch that for details about how
frequent and how many hits to the same file is "too much" for a suspect PID,
what files are totally ignored and whether prints are in timestamp or boottime
(that's the gist of user options).
The rest of gauges are also configurable but they are aimed for developers (see
in stat.rs, time resolution types and constants).
This project follows a simple
aya-template structure:
*-commonpostfix is shared between kernel and user space*-ebpfpostfix is the kernel space- The crate with no postfix is the userspace, loader and monitor which has the
ebpfbinary embedded inside
In this project, common holds some custom utilities like ringbuf! macro
As evident in the documents (cargo doc), to keep symbols and values in sync
between the kernel and user for aya does check things in early runtime which
proved slow for such a short window.
That leaves two more crates. ebpf is literally one file which does the minimal
work of populating ringbuf maps. This is a good point that only ringbuf maps
are used since they are async and the macro was once written.
The main part of the program is the user agent which basically does the following:
- Load eBPF (mostly
ayaboilerplate) - Read values from maps
- Filter through it and check it with a very minimal list of hardcoded rules
- Match the quota of writing and monitoring
- Log and repeat by reading more from maps
The implementation details are in the files alongside main.rs. Those files
hold the monitoring ("throttling," not really) policy and high level log
structures.
All in all, everything is kept primitive. Timestamp is usually a small integer
(code is compatible with changing its size but generics are avoided due to
boilerplates and generic nightmares they induce, num wasn't used either). The
only other type remains a "moving average" used for time-series data in an
almost efficient manner (space-wise, see the docs).
At first, tracepoint applications were chosen for the task however that proved
extra complicated to track every tracepoint for files, converting the context
they take and especially canonizing the paths without bpf_*d_path calls. That
is exactly strace. See src/linux/*/syscallent.h and search for TRACE_FILE
alias which resolves to TF and used by %file when selected.
After some sketches they were abandoned in favor of lsm applications. Not only
that doesn't have that issue, it also allows for easy modification of
permissions simply by changing the return value (see Resources, and comments on
the main function of the eBPF binary).
Based on a quick lookover the READMEs of the two projects, aya was selected
for the purpose of this demonstration since it does not rely on the C
counterparts and other complications and also the APIs used here are pretty
primitive and already supported in aya.
There is also rbpf (or ubpf) runtime and bpftool command-line tool to base
a solution on which introduce additional layers hence avoided.
This program uses almost 2MB PSS and about 10MB RSS and does almost nothing so naturally CPU footstep is almost none. But since "efficiency" measures are not introduced beyond "xMB RAM and %n CPU usage." Therefore, I persumed the given criteria of "Is fairly efficient" is met.
This was a delibrate shortcut taken not to analyze who is calling what and just terminate. This was the fastest way to check the box and move on.
Most of the tests are basic length checks as compile-time errors for ease of use there are one or two just to show as a placeholder for future development.
"Detection and parsing rules" are as basic as it gets. "Parsing" doesn't exist
basically and deserialization is just a 10 line macro using format_args so
there is not much to test. serde was not used for this project.
Beyond compile-time checks in form of const [_ = ()] {} trickery, no other
integration tests were taken into account as of writing this README at the
finale of the allowed duration.
This was my first experience with kernel (aside from the old kmod style
experiments, years prior), getting my hands on eBPF with all the different types
and C shenanigans needed multiple references and examples which are not common
as "CRUD in JS." What were used and answered to what questions is scattered in
the docs (course-project-style) and available (cargo doc) so details are
spared here:
- eBPF.io: the official (?) reference
- BPF and XDP Reference Guide: what is eBPF in details
- The Aya Book:
a (very) brief introduction to
aya - Let's introduce eBPF and Aya by Joseph
Ligier:
a multi-language demonstration of using
tracepointand not usingclippy - eBPF File System Monitoring by theshoemaker: headspace of a C developer in the same position
- eBPF Tutorial by Example by eunomia:
extensive details on eBPF development in C and
eunomia(Example 4 is another tracepoint) - Kernel eBPF examples: official eBPF examples of kernel
- BPF ring buffer docs
(
next-20250912): implementation details of ringbuf in English - Learning eBPF by Liz Rice: for double-checking as a friendlier reference
- Linux Device Drivers by Jonathan Corbet et
al.:
just for the
struct file(File Operations chapter)