fix(metadata): break CGo import chain from comp/metadata/host/impl/utils#52901
fix(metadata): break CGo import chain from comp/metadata/host/impl/utils#52901songy23 wants to merge 1 commit into
Conversation
comp/metadata/host/impl/utils unconditionally imported pkg/collector/python, pulling in CGo rtloader bindings whenever the python build tag was active. This caused a typecheck failure when linting cmd/otel-agent (or any other target) without the rtloader headers installed, because discovery.go in the python package uses C.discover_config from datadog_agent_rtloader.h. The fix introduces two build-tagged helper files: - python_version.go (//go:build python) reads the Python info string from the shared agent cache using the same key that pkg/collector/python/version_python.go populates at runtime. No import of pkg/collector/python, no CGo. - python_version_nopy.go (//go:build !python) returns "n/a" stubs. All source and test files in the utils package now call the package-local getPythonInfo() / getPythonVersion() helpers instead of importing pkg/collector/python directly. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 11608d2887
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| Os: osName, | ||
| AgentFlavor: flavor.GetFlavor(), | ||
| PythonVersion: python.GetPythonInfo(), | ||
| PythonVersion: getPythonVersion(), |
There was a problem hiding this comment.
Send full Python info in host metadata
When the Python cache contains the normal full interpreter string after rtloader initialization, this now sends only the first token in the top-level python field. Before this refactor that field used python.GetPythonInfo(), while systemStats.pythonV is the short-version field, so consumers of /metadata/v5 lose the build/compiler details that were previously reported; use getPythonInfo() here to preserve the payload shape.
Useful? React with 👍 / 👎.
| if x, found := cache.Cache.Get(pythonInfoCacheKey); found { | ||
| return x.(string) | ||
| } | ||
| return "n/a" |
There was a problem hiding this comment.
Preserve lazy-loaded Python sentinel
When the agent is built with the python tag but python_lazy_loading leaves rtloader uninitialized, the cache entry is absent. The function being replaced, pkg/collector/python.GetPythonInfo(), returns the documented unloaded sentinel "unused" in that case, but this helper now returns the no-Python-build value "n/a", so host metadata collected before the first Python check misreports a lazy-loaded runtime as unavailable; return "unused" on cache miss for this build-tagged helper.
Useful? React with 👍 / 👎.
Go Package Import DifferencesBaseline: 1d93f4e
|
|
🎯 Code Coverage (details) 🔗 Commit SHA: 11608d2 | Docs | Datadog PR Page | Give us feedback! |
Files inventory check summaryFile checks results against ancestor 1d93f4e9: Results for datadog-agent_7.82.0~devel.git.430.11608d2.pipeline.121442302-1_amd64.deb:No change detected |
Static quality checks✅ Please find below the results from static quality gates 33 successful checks with minimal change (< 2 KiB)
|
Regression DetectorRegression Detector ResultsMetrics dashboard Baseline: 1d93f4e Optimization Goals: ✅ No significant changes detected
|
| perf | experiment | goal | Δ mean % | Δ mean % CI | trials | links |
|---|---|---|---|---|---|---|
| ➖ | quality_gate_security_mean_fs_load | memory utilization | +0.35 | [+0.31, +0.39] | 1 | Logs bounds checks dashboard |
| ➖ | quality_gate_metrics_logs | memory utilization | +0.18 | [-0.07, +0.43] | 1 | Logs bounds checks dashboard |
| ➖ | quality_gate_security_no_fs_load | memory utilization | +0.18 | [+0.08, +0.28] | 1 | Logs bounds checks dashboard |
| ➖ | quality_gate_security_idle | memory utilization | -0.11 | [-0.17, -0.05] | 1 | Logs bounds checks dashboard |
| ➖ | quality_gate_logs | % cpu utilization | -0.15 | [-1.26, +0.95] | 1 | Logs bounds checks dashboard |
| ➖ | quality_gate_idle | memory utilization | -0.20 | [-0.25, -0.15] | 1 | Logs bounds checks dashboard |
| ➖ | quality_gate_idle_all_features | memory utilization | -0.69 | [-0.72, -0.65] | 1 | Logs bounds checks dashboard |
Bounds Checks: ✅ Passed
| perf | experiment | bounds_check_name | replicates_passed | observed_value | links |
|---|---|---|---|---|---|
| ✅ | quality_gate_idle | intake_connections | 10/10 | 3 ≤ 4 | bounds checks dashboard |
| ✅ | quality_gate_idle | memory_usage | 10/10 | 146.30MiB ≤ 154MiB | bounds checks dashboard |
| ✅ | quality_gate_idle | total_bytes_received | 10/10 | 577.66KiB ≤ 819.20KiB | bounds checks dashboard |
| ✅ | quality_gate_idle_all_features | intake_connections | 10/10 | 3 ≤ 4 | bounds checks dashboard |
| ✅ | quality_gate_idle_all_features | memory_usage | 10/10 | 485.60MiB ≤ 495MiB | bounds checks dashboard |
| ✅ | quality_gate_idle_all_features | total_bytes_received | 10/10 | 0.89MiB ≤ 1.25MiB | bounds checks dashboard |
| ✅ | quality_gate_logs | intake_connections | 10/10 | 4 ≤ 6 | bounds checks dashboard |
| ✅ | quality_gate_logs | memory_usage | 10/10 | 179.84MiB ≤ 195MiB | bounds checks dashboard |
| ✅ | quality_gate_logs | missed_bytes | 10/10 | 0B = 0B | bounds checks dashboard |
| ✅ | quality_gate_logs | total_bytes_received | 10/10 | 263.85MiB ≤ 292MiB | bounds checks dashboard |
| ✅ | quality_gate_metrics_logs | cpu_usage | 10/10 | 343.10 ≤ 2000 | bounds checks dashboard |
| ✅ | quality_gate_metrics_logs | intake_connections | 10/10 | 3 ≤ 6 | bounds checks dashboard |
| ✅ | quality_gate_metrics_logs | memory_usage | 10/10 | 396.08MiB ≤ 430MiB | bounds checks dashboard |
| ✅ | quality_gate_metrics_logs | missed_bytes | 10/10 | 0B = 0B | bounds checks dashboard |
| ✅ | quality_gate_metrics_logs | total_bytes_received | 10/10 | 0.86GiB ≤ 1.04GiB | bounds checks dashboard |
| ✅ | quality_gate_security_idle | cpu_usage | 10/10 | 29.80 ≤ 40 | bounds checks dashboard |
| ✅ | quality_gate_security_idle | memory_usage | 10/10 | 296.22MiB ≤ 330MiB | bounds checks dashboard |
| ✅ | quality_gate_security_mean_fs_load | cpu_usage | 10/10 | 61.61 ≤ 80 | bounds checks dashboard |
| ✅ | quality_gate_security_mean_fs_load | memory_usage | 10/10 | 272.96MiB ≤ 310MiB | bounds checks dashboard |
| ✅ | quality_gate_security_no_fs_load | cpu_usage | 10/10 | 22.35 ≤ 40 | bounds checks dashboard |
| ✅ | quality_gate_security_no_fs_load | memory_usage | 10/10 | 279.70MiB ≤ 320MiB | bounds checks dashboard |
Explanation
Confidence level: 90.00%
Effect size tolerance: |Δ mean %| ≥ 5.00%
Performance changes are noted in the perf column of each table:
- ✅ = significantly better comparison variant performance
- ❌ = significantly worse comparison variant performance
- ➖ = no significant change in performance
A regression test is an A/B test of target performance in a repeatable rig, where "performance" is measured as "comparison variant minus baseline variant" for an optimization goal (e.g., ingress throughput). Due to intrinsic variability in measuring that goal, we can only estimate its mean value for each experiment; we report uncertainty in that value as a 90.00% confidence interval denoted "Δ mean % CI".
For each experiment, we decide whether a change in performance is a "regression" -- a change worth investigating further -- if all of the following criteria are true:
-
Its estimated |Δ mean %| ≥ 5.00%, indicating the change is big enough to merit a closer look.
-
Its 90.00% confidence interval "Δ mean % CI" does not contain zero, indicating that if our statistical model is accurate, there is at least a 90.00% chance there is a difference in performance between baseline and comparison variants.
-
Its configuration does not mark it "erratic".
Replicate Execution Details
We run multiple replicates for each experiment/variant. However, we allow replicates to be automatically retried if there are any failures, up to 8 times, at which point the replicate is marked dead and we are unable to run analysis for the entire experiment. We call each of these attempts at running replicates a replicate execution. This section lists all replicate executions that failed due to the target crashing or being oom killed.
Note: In the below tables we bucket failures by experiment, variant, and failure type. For each of these buckets we list out the replicate indexes that failed with an annotation signifying how many times said replicate failed with the given failure mode. In the below example the baseline variant of the experiment named experiment_with_failures had two replicates that failed by oom kills. Replicate 0, which failed 8 executions, and replicate 1 which failed 6 executions, all with the same failure mode.
| Experiment | Variant | Replicates | Failure | Logs | Debug Dashboard |
|---|---|---|---|---|---|
| experiment_with_failures | baseline | 0 (x8) 1 (x6) | Oom killed | Debug Dashboard |
The debug dashboard links will take you to a debugging dashboard specifically designed to investigate replicate execution failures.
❌ Retried Profiling Replicate Execution Failures (ddprof)
Note: Profiling replicas may still be executing. See the debug dashboard for up to date status.
| Experiment | Variant | Replicates | Failure | Debug Dashboard |
|---|---|---|---|---|
| quality_gate_idle | baseline | 10 | Oom killed | Debug Dashboard |
| quality_gate_idle | comparison | 10 | Oom killed | Debug Dashboard |
| quality_gate_idle_all_features | baseline | 10 | Oom killed | Debug Dashboard |
| quality_gate_idle_all_features | comparison | 10 | Oom killed | Debug Dashboard |
| quality_gate_logs | baseline | 10 | Oom killed | Debug Dashboard |
| quality_gate_logs | comparison | 10 | Oom killed | Debug Dashboard |
| quality_gate_metrics_logs | baseline | 10 | Oom killed | Debug Dashboard |
| quality_gate_metrics_logs | comparison | 10 | Oom killed | Debug Dashboard |
| quality_gate_security_idle | baseline | 10 | Oom killed | Debug Dashboard |
| quality_gate_security_idle | comparison | 10 | Oom killed | Debug Dashboard |
| quality_gate_security_no_fs_load | baseline | 10 | Oom killed | Debug Dashboard |
| quality_gate_security_no_fs_load | comparison | 10 | Oom killed | Debug Dashboard |
CI Pass/Fail Decision
✅ Passed. All Quality Gates passed.
- quality_gate_security_no_fs_load, bounds check cpu_usage: 10/10 replicas passed. Gate passed.
- quality_gate_security_no_fs_load, bounds check memory_usage: 10/10 replicas passed. Gate passed.
- quality_gate_security_idle, bounds check memory_usage: 10/10 replicas passed. Gate passed.
- quality_gate_security_idle, bounds check cpu_usage: 10/10 replicas passed. Gate passed.
- quality_gate_idle, bounds check memory_usage: 10/10 replicas passed. Gate passed.
- quality_gate_idle, bounds check intake_connections: 10/10 replicas passed. Gate passed.
- quality_gate_idle, bounds check total_bytes_received: 10/10 replicas passed. Gate passed.
- quality_gate_metrics_logs, bounds check missed_bytes: 10/10 replicas passed. Gate passed.
- quality_gate_metrics_logs, bounds check cpu_usage: 10/10 replicas passed. Gate passed.
- quality_gate_metrics_logs, bounds check intake_connections: 10/10 replicas passed. Gate passed.
- quality_gate_metrics_logs, bounds check memory_usage: 10/10 replicas passed. Gate passed.
- quality_gate_metrics_logs, bounds check total_bytes_received: 10/10 replicas passed. Gate passed.
- quality_gate_logs, bounds check total_bytes_received: 10/10 replicas passed. Gate passed.
- quality_gate_logs, bounds check memory_usage: 10/10 replicas passed. Gate passed.
- quality_gate_logs, bounds check intake_connections: 10/10 replicas passed. Gate passed.
- quality_gate_logs, bounds check missed_bytes: 10/10 replicas passed. Gate passed.
- quality_gate_idle_all_features, bounds check total_bytes_received: 10/10 replicas passed. Gate passed.
- quality_gate_idle_all_features, bounds check memory_usage: 10/10 replicas passed. Gate passed.
- quality_gate_idle_all_features, bounds check intake_connections: 10/10 replicas passed. Gate passed.
- quality_gate_security_mean_fs_load, bounds check memory_usage: 10/10 replicas passed. Gate passed.
- quality_gate_security_mean_fs_load, bounds check cpu_usage: 10/10 replicas passed. Gate passed.
Summary
comp/metadata/host/impl/utilsunconditionally importedpkg/collector/python, pulling in CGo rtloader bindings (discovery.go,embed_python.go) whenever thepythonbuild tag was activecmd/otel-agent(or any target using the default base lint flavor) without the rtloader headers installed, becausediscovery.goreferencesC.discover_configfromdatadog_agent_rtloader.hpkg/collector/pythonpopulates at runtime — with no CGo dependency of their ownChanges
comp/metadata/host/impl/utils/python_version.go(//go:build python): reads Python info from the agent cache without importingpkg/collector/pythoncomp/metadata/host/impl/utils/python_version_nopy.go(//go:build !python): returns"n/a"stubshost.go,host_nix.go,host_windows.go: droppkg/collector/pythonimport, call localgetPythonInfo()/getPythonVersion()Test plan
dda inv linter.go --targets=./cmd/otel-agent/...— 0 issues (was: typecheck failure ondiscovery.go)dda inv linter.go --targets=./comp/metadata/host/...— 0 issuesdda inv test --targets=./comp/metadata/host/...— 37 passed, 1 skipped (platform-specific)🤖 Generated with Claude Code