Skip to main content
In this lesson we examine how to harden containers by applying sandboxing techniques that reduce the kernel attack surface and limit what containerized processes can do. We contrast container isolation with virtual machines, show practical examples (including PID namespaces), and present common sandboxing controls and advanced alternatives that provide stronger isolation.

Quick refresher: virtual machines vs containers

Virtual machines provide strong isolation because each VM runs a full guest operating system and its own kernel on top of a hypervisor. Containers, by contrast, share the host kernel and isolate processes using kernel mechanisms such as namespaces and cgroups. That architecture difference is critical when evaluating attack surfaces and escape risks.
A diagram titled "Container Sandboxing" comparing virtual machines and containers. It shows layered stacks (application, libs, deps, guest OS) with VMs using a hypervisor and separate guest OSes, while containers share Docker and the host OS over the hardware.
Isolation AspectVirtual MachinesContainers
Kernel per workloadYes — dedicated guest kernelNo — shared host kernel
Isolation mechanismHypervisor-basedNamespaces, cgroups, LSMs
Typical use caseStrong multi-tenant isolationLightweight microservices, higher density
Escape risk if kernel compromisedLower (guest kernel separation)Higher (shared kernel)

PID namespaces: a simple example

Containers map process IDs into a PID namespace. Inside the container, a process can appear as PID 1, while on the host it has a different PID. This demonstrates logical process isolation but also shows that the host can still observe and terminate the underlying host PID. Example (run a BusyBox container that sleeps for 1000 seconds):
# Run a container that sleeps
$ docker run -d --name sleeping-container busybox sleep 1000
e2fd5090c9a51eb7cc91a466cf2e18c5468871f84adbb55c2e6c1cf4ea0028a8

# Inside the container: PID 1 is the sleep process
$ docker exec -ti sleeping-container ps -ef
PID   USER     TIME  COMMAND
1     root     0:00  sleep 1000
11    root     0:00  ps -ef

# On the host you can also see the sleep process with a different PID
$ ps -ef | grep sleep | grep -vi grep
root     7902  7871  0 21:39 ?        00:00:00 sleep 1000
Because the host-level process exists, killing that host PID terminates the container process. This shows that namespaces provide isolation at the user-space level, but the shared kernel remains the ultimate control plane.
Containers share the host kernel. If the kernel has a vulnerability (for example, a local privilege escalation like Dirty COW), a compromised container process can potentially exploit the kernel and affect the host and other containers.

How containerized processes interact with the kernel

Applications (in containers or on bare OS) run in user space and make system calls to access hardware and privileged services. Since containers use the host kernel, restricting system call access and other kernel-visible actions is a key hardening strategy. Two widely used kernel-level sandboxing controls are seccomp and AppArmor (or SELinux).
  • Seccomp: restricts the set of system calls a process may invoke.
  • AppArmor: enforces path- and capability-based access controls for files and other resources.
Both tools reduce the risk of a kernel exploit being used from within a container by reducing what code inside the container can ask the kernel to do.

Example sandboxing configurations

Example seccomp profile (whitelist-style — deny by default, allow a small syscall set):
{
  "defaultAction": "SCMP_ACT_ERRNO",
  "architectures": [
    "SCMP_ARCH_X86_64",
    "SCMP_ARCH_X86",
    "SCMP_ARCH_X32"
  ],
  "syscalls": [
    {
      "names": [
        "execve",
        "brk",
        "access",
        "capset",
        "clone"
      ],
      "action": "SCMP_ACT_ALLOW"
    }
  ]
}
Example AppArmor profile snippet (blacklist-style rule denying writes to /proc):
profile apparmor-deny-write flags=(attach_disconnected) {
    #include <abstractions/base>

    # Allow typical reads and library access (adjust paths for your program)
    /usr/bin/your-binary ixr,
    /lib/** r,
    /usr/lib/** r,
    /etc/** r,

    # Deny all write access to /proc
    deny /proc/** w,
}

Whitelist vs blacklist approaches

PatternDescriptionStrengthsTrade-offs
Whitelist (seccomp)Default deny; explicitly allow required syscallsMinimal kernel surface, strong securityRequires profiling/application knowledge
Blacklist (AppArmor snippet)Default allow; block specific actions or pathsEasier to implement for diverse appsMay miss attack vectors; less strict
When feasible, prefer whitelist-based restrictions (e.g., seccomp profiles) to minimize the kernel functionality exposed to containerized applications. Use blacklists when you need broader compatibility and then complement them with other controls.

Practical guidance for production workloads

  • Small, homogeneous fleets: Create strict, minimal seccomp and AppArmor/SELinux profiles for each service (for example, many Nginx or MySQL instances). This gives strong protection with manageable maintenance.
  • Large, heterogeneous fleets: Use a layered approach — namespaces + cgroups + capability drops + seccomp + LSMs (AppArmor/SELinux) — and focus on automation to generate and roll out profiles.
  • Follow the principle of least privilege: drop Linux capabilities your process does not need and restrict filesystem and network access.
  • Monitor and iterate: use runtime observability and profiling to generate accurate whitelists and to find false positives/negatives before enforcing strict policies.

Advanced sandboxing / microVM alternatives

If the shared-kernel model is unacceptable for your threat model, consider technologies that provide stronger kernel isolation by running containers inside lightweight VMs or alternative kernels:
TechnologyDescriptionUse case
gVisorUser-space kernel that intercepts syscalls and emulates kernel behaviorImprove isolation without heavy VMs
Kata ContainersRuns container workloads inside lightweight VMs managed by a runtimeStronger isolation with VM-level boundaries
FirecrackerMicroVMs designed for minimal overhead and fast startupServerless and multi-tenant isolation at VM level
These projects trade some density and complexity for stronger separations between workloads and the host kernel.

Further reading and references

Choose the combination of sandboxing techniques that best fits your operational constraints and threat model. No single control is sufficient on its own — layering defenses increases resilience while balancing manageability.

Watch Video