Panic Recovery Around Plugin RunChecks() Calls¶
Problem¶
PluginRegistry.RunComplianceChecks in internal/audit/plugin.go called p.RunChecks(device) without any panic protection. The compliance.Plugin interface allows arbitrary implementations, including dynamically-loaded .so plugins. Any panic in RunChecks() propagated uncaught through the audit pipeline, killing the process and losing all other plugins' results.
Symptoms: If any compliance plugin panicked during RunChecks(), the entire audit crashed. Other healthy plugins' results were lost. No structured error reporting for the failure.
Root Cause¶
The compliance plugin interface (compliance.Plugin) is an extension point designed for both built-in and dynamically-loaded plugins. Dynamic plugins loaded via Go's plugin package can contain arbitrary code. A bare p.RunChecks(device) call provides no isolation boundary, so any panic in plugin code propagates up the call stack and terminates the audit.
Solution¶
Wrap the p.RunChecks(device) call in an immediately-invoked function with defer recover(). Add a *logging.Logger parameter to RunComplianceChecks for structured panic logging.
Code Example¶
// Before: bare call, panic propagates and kills the process
findings := p.RunChecks(device)
// After: panic recovery with structured logging and stack trace
var findings []compliance.Finding
func() {
defer func() {
if r := recover(); r != nil {
logger.Error("plugin panicked during RunChecks",
"plugin", pluginName,
"panic", r,
"stack", string(debug.Stack()),
)
}
}()
findings = p.RunChecks(device)
}()
The immediately-invoked function literal creates a deferred recovery scope isolated to the single plugin call. If RunChecks panics, the deferred function catches it, logs the plugin name and panic value, and execution continues with findings at its zero value (nil slice). The surrounding loop proceeds to process remaining plugins unaffected.
Key Design Decisions¶
- Logger parameter uses
*logging.Logger-- matches the project's charmbracelet/log-based logging, ensuring panic recovery logs respect the application's configured output format, level, and destination. A nil guard defaults to a fallback logger so callers can never trigger a secondary panic on the recovery path. - Stack trace included via
runtime/debug.Stack()-- matches the pattern ininternal/processor/processor.go, giving actionable debugging information for misbehaving plugins (especially dynamic.sofiles). - Panicked plugins retained in results with safe defaults -- a
panickedflag gates the post-recovery path. When set, safe defaults (pluginName,Version: "unknown (panicked)", empty compliance map) are populated and the loopcontinues, skipping method calls (Name(),Description(),GetControls()) on the potentially corrupt plugin. Downstream consumers can see the plugin was requested and evaluated. See GOTCHAS.md SS2.2. - Post-recovery method calls avoided -- after a panic, the plugin's internal state may be corrupt. Calling
p.Name(),p.GetControls(), etc. could trigger a secondary unrecovered panic. The recovery path uses only thepluginNamestring already in scope from the loop variable.
Files Changed¶
internal/audit/plugin.go-- core panic recovery wrapper with*logging.Loggeranddebug.Stack()internal/audit/plugin_manager.go-- passespm.logger(configured logger)internal/audit/mode_controller.go-- passesmc.logger(configured logger)internal/audit/plugin_global_test.go--mockPanickingPluginand isolation tests addedinternal/audit/mode_controller_test.go-- call sites updated,LoadDynamicPluginstest enabled
Prevention¶
Pattern¶
When calling any method on an interface implementation that may originate from external or dynamic sources (plugin .so files, user-provided implementations), wrap the call in a deferred recover() that captures the panic value and logs it. The recovery boundary should be as narrow as possible -- one per plugin invocation, not one global recovery around the entire audit loop.
Where Else This Might Apply¶
| Call site | Method | Status |
|---|---|---|
RunComplianceChecks |
plugin.RunChecks() |
Protected (this fix) |
InitializePlugins |
plugin.Name(), plugin.Version() |
Review needed -- dynamic plugin metadata accessors can panic |
LoadDynamicPlugins |
plugin.Lookup("Plugin") result casting |
Review needed -- type assertion on plugin.Symbol can panic |
GetControls() / GetControlByID() |
Called in deriveSeverityFromControl |
Review needed -- called during severity derivation |
Testing Strategy¶
Use the mockPanickingPlugin pattern:
- Embed
mockCompliancePlugin, overrideRunChecksto panic - Table-driven tests with both panicking and healthy plugins
- Assert: healthy findings preserved, panicked plugin present with zero findings, no error returned
- Run
just test-raceto confirm no data races in recovery closures
Review Checklist¶
- Plugin interface method calls on dynamic/external plugins are wrapped in
defer recover() - Recovery boundary is per-plugin, not per-audit-run
- Panicked plugins retained in results (not skipped) per GOTCHAS.md SS2.2
- Tests include
mockPanickingPluginscenarios alongside healthy plugins -
just test-racepasses
Related Documentation¶
- GOTCHAS.md SS2.1 -- Registry Independence
- GOTCHAS.md SS2.2 -- Panic Recovery Retains Plugins (invariant)
- AGENTS.md SS8.2 -- Plugin Development standards
- Plugin Development Guide
- Architecture Issues: Pluggable DeviceParser Registry -- parallel registry pattern
- GitHub Issue #309 -- original tracking issue
- GitHub Issue #311 -- dynamic plugin load failures (related, still open)