Panic Recovery Around Plugin RunChecks() Calls¶

Problem¶

PluginRegistry.RunComplianceChecks in internal/audit/plugin.go called p.RunChecks(device) without any panic protection. The compliance.Plugin interface allows arbitrary implementations, including dynamically-loaded .so plugins. Any panic in RunChecks() propagated uncaught through the audit pipeline, killing the process and losing all other plugins' results.

Symptoms: If any compliance plugin panicked during RunChecks(), the entire audit crashed. Other healthy plugins' results were lost. No structured error reporting for the failure.

Root Cause¶

The compliance plugin interface (compliance.Plugin) is an extension point designed for both built-in and dynamically-loaded plugins. Dynamic plugins loaded via Go's plugin package can contain arbitrary code. A bare p.RunChecks(device) call provides no isolation boundary, so any panic in plugin code propagates up the call stack and terminates the audit.

Solution¶

Wrap the p.RunChecks(device) call in an immediately-invoked function with defer recover(). Add a *logging.Logger parameter to RunComplianceChecks for structured panic logging.

Code Example¶

// Before: bare call, panic propagates and kills the process
findings := p.RunChecks(device)

// After: panic recovery with structured logging and stack trace
var findings []compliance.Finding

func() {
    defer func() {
        if r := recover(); r != nil {
            logger.Error("plugin panicked during RunChecks",
                "plugin", pluginName,
                "panic", r,
                "stack", string(debug.Stack()),
            )
        }
    }()
    findings = p.RunChecks(device)
}()

The immediately-invoked function literal creates a deferred recovery scope isolated to the single plugin call. If RunChecks panics, the deferred function catches it, logs the plugin name and panic value, and execution continues with findings at its zero value (nil slice). The surrounding loop proceeds to process remaining plugins unaffected.

Key Design Decisions¶

Logger parameter uses *logging.Logger -- matches the project's charmbracelet/log-based logging, ensuring panic recovery logs respect the application's configured output format, level, and destination. A nil guard defaults to a fallback logger so callers can never trigger a secondary panic on the recovery path.
Stack trace included via runtime/debug.Stack() -- matches the pattern in internal/processor/processor.go, giving actionable debugging information for misbehaving plugins (especially dynamic .so files).
Panicked plugins retained in results with safe defaults -- a panicked flag gates the post-recovery path. When set, safe defaults (pluginName, Version: "unknown (panicked)", empty compliance map) are populated and the loop continues, skipping method calls (Name(), Description(), GetControls()) on the potentially corrupt plugin. Downstream consumers can see the plugin was requested and evaluated. See GOTCHAS.md SS2.2.
Post-recovery method calls avoided -- after a panic, the plugin's internal state may be corrupt. Calling p.Name(), p.GetControls(), etc. could trigger a secondary unrecovered panic. The recovery path uses only the pluginName string already in scope from the loop variable.

Files Changed¶

internal/audit/plugin.go -- core panic recovery wrapper with *logging.Logger and debug.Stack()
internal/audit/plugin_manager.go -- passes pm.logger (configured logger)
internal/audit/mode_controller.go -- passes mc.logger (configured logger)
internal/audit/plugin_global_test.go -- mockPanickingPlugin and isolation tests added
internal/audit/mode_controller_test.go -- call sites updated, LoadDynamicPlugins test enabled

Prevention¶

Pattern¶

When calling any method on an interface implementation that may originate from external or dynamic sources (plugin .so files, user-provided implementations), wrap the call in a deferred recover() that captures the panic value and logs it. The recovery boundary should be as narrow as possible -- one per plugin invocation, not one global recovery around the entire audit loop.

Where Else This Might Apply¶

Call site	Method	Status
`RunComplianceChecks`	`plugin.RunChecks()`	Protected (this fix)
`InitializePlugins`	`plugin.Name()`, `plugin.Version()`	Review needed -- dynamic plugin metadata accessors can panic
`LoadDynamicPlugins`	`plugin.Lookup("Plugin")` result casting	Review needed -- type assertion on `plugin.Symbol` can panic
`GetControls()` / `GetControlByID()`	Called in `deriveSeverityFromControl`	Review needed -- called during severity derivation

Testing Strategy¶

Use the mockPanickingPlugin pattern:

Embed mockCompliancePlugin, override RunChecks to panic
Table-driven tests with both panicking and healthy plugins
Assert: healthy findings preserved, panicked plugin present with zero findings, no error returned
Run just test-race to confirm no data races in recovery closures

Review Checklist¶

Plugin interface method calls on dynamic/external plugins are wrapped in defer recover()
Recovery boundary is per-plugin, not per-audit-run
Panicked plugins retained in results (not skipped) per GOTCHAS.md SS2.2
Tests include mockPanickingPlugin scenarios alongside healthy plugins
just test-race passes

GOTCHAS.md SS2.1 -- Registry Independence
GOTCHAS.md SS2.2 -- Panic Recovery Retains Plugins (invariant)
AGENTS.md SS8.2 -- Plugin Development standards
Plugin Development Guide
Architecture Issues: Pluggable DeviceParser Registry -- parallel registry pattern
GitHub Issue #309 -- original tracking issue
GitHub Issue #311 -- dynamic plugin load failures (related, still open)