Building an IaC Security Scanner in Go

Building an IaC Security Scanner in Go

As a security engineer who’s implemented security controls across cloud environments, I’ve always been curious about how policy engines actually work under the hood. Sure, I could use existing tools like Checkov or tfsec, but there’s something to be said for understanding the fundamentals of how infrastructure security scanning really operates.

That curiosity led me down a rabbit hole (yes, a rabbit hole) that consumed weekends and resulted in PolicyGuard, an IaC security scanner built from scratch in Go. What started as a “let me just understand how this works” project turned into something that’s now catching security issues in infrastructure code.

The Problem That Started It

Working in cloud security, you see the same patterns over and over. Developers spin up S3 buckets with public read-write access. EC2 instances get launched without encryption. Security groups are wide open to the internet. The tools exist to catch these issues, but I wanted to understand how they actually parse Terraform files and evaluate security policies.

The existing landscape felt fragmented. Some tools are great at parsing but terrible at custom policies. Others have powerful policy engines but can’t handle complex infrastructure configurations. I wanted to build something that could bridge that gap while being genuinely educational about the internals.

Why Go Made Sense

Coming from a background heavy in Python for enterprise tools, Go wasn’t my first instinct. But for a security tool that needs to parse configuration files, evaluate policies, and potentially run in CI/CD pipelines, Go’s characteristics became compelling.

The concurrency model meant I could parse multiple Terraform files simultaneously without the complexity of managing thread pools. I would probably have a separate write-up detailing my experience here. The static typing caught errors at compile time that would have been runtime surprises in Python. Most importantly, the single binary deployment meant no dependency hell when installing the tool in different environments.

The learning curve was steeper than expected, though. Go’s interface system felt foreign coming from object-orientated languages. I spent more time than I’d like to admit getting comfortable with how interfaces work for dependency injection and testing.

Parsing Infrastructure as Code? Harder Than It Looks

The first major challenge was parsing Terraform files. HCL (HashiCorp Configuration Language) looks simple on the surface, but the reality is more complex. There are subtle differences between .tf files and .tf.json files. Variable interpolations can be deeply nested. Data sources reference other resources in ways that aren’t immediately obvious.

I initially tried to build a simple parser that would extract resource blocks and their attributes. That worked for basic cases but fell apart quickly when encountering real-world Terraform code with modules, complex variable references, and conditional resource creation.

Using HashiCorp’s own HCL parsing library made my workload much easier. Instead of reinventing the wheel, I could leverage the same parsing logic that Terraform itself uses. This handled all the edge cases I was encountering.

But then came OpenTofu support. When the Terraform fork emerged, I realized the tool should support both ecosystems. Fortunately, since OpenTofu maintains compatibility with HCL syntax, adding support was mostly seamless.

The Policy Engine Challenge

For policy evaluation, I chose OPA (Open Policy Agent) and its Rego language. This decision came from seeing OPA’s adoption in Kubernetes environments and wanting to understand how declarative policy languages work in practice.

Rego has a learning curve that’s definitely non-trivial. Coming from imperative programming languages, thinking in terms of rules and constraints rather than step-by-step instructions required a mental shift. The debugging experience is also quite different from traditional programming.

Writing security policies in Rego meant really understanding the structure of Terraform resources. For S3 bucket policies, I had to account for different ways encryption can be configured, various ACL settings, and the interplay between bucket policies and public access blocks. Each AWS service has its own quirks and security considerations that need to be encoded into policy rules.

The interesting part was discovering how to make policies extensible. Users should be able to drop in new .rego files without recompiling the entire tool. This required designing the policy loading system to dynamically discover and compile policy files at runtime.

Testing Infrastructure Code

Testing a security scanner presented unique challenges. Unit tests were straightforward for individual components, but integration testing required actual Terraform files with known security issues. I had to create a test section with deliberately insecure infrastructure configurations.

The coverage metrics were initially disappointing. Testing policy evaluation meant creating extensive test cases for different resource configurations and ensuring that violations were detected correctly. I learned that achieving meaningful test coverage in security tooling requires thinking beyond code coverage to scenario coverage.

One particular headache was testing the pass rate calculation. The initial implementation was counting total violations instead of unique resources with violations, leading to confusing metrics like negative pass rates. Getting the math right required careful thinking about what actually constitutes a “passing” resource in the context of security policies, which I would admit deserves further research, which I am exploring further.

CI/CD Integration

Building the tool was one thing, but making it useful in actual development workflows was another challenge entirely. Modern development teams expect tools that integrate seamlessly with their existing CI/CD pipelines.

Supporting multiple output formats became a priority implementation. Security teams want SARIF format for GitHub Security tab integration. QA teams prefer JUnit XML for test reporting dashboards. Developers want human-readable output they can act on immediately. Each format has different requirements and expectations for how violations should be represented.

The GitHub Actions integration revealed issues I hadn’t anticipated during local development. Path handling works differently in different operating systems. Environment variable handling has subtle differences between local shells and CI environments. Windows support required additional testing since most of my development was on macOS.

Getting the automated release pipeline working was its own journey. I wanted the tool to be easily installable via “go install” which meant publishing to GitHub Packages and ensuring proper semantic versioning. The number of edge cases in release automation workflows was humbling.

Considerations on Performance

As the tool matured, performance became a consideration. Parsing large Terraform configurations can be memory-intensive, especially when dealing with generated files from tools like Terragrunt. Policy evaluation scales with the number of resources and the complexity of policies.

I implemented concurrent processing for parsing multiple files but had to be careful about resource contention when multiple goroutines were evaluating policies simultaneously. The OPA engine has its own performance characteristics that needed to be understood and worked around.

Caching became important for developer workflows where the same files are scanned repeatedly during development cycles. But cache invalidation is tricky when policies can be updated independently of the infrastructure code being scanned.

Human Side of Security Tooling, A Developer POV?

Building security tooling taught me that the technical challenges are often secondary to the human factors. Security violations need to be presented in a way that helps developers/users understand not just what’s wrong, but why it matters and how to fix it.

The remediation suggestions became a crucial feature. It’s not enough to say “S3 bucket has public access”; developers need to know exactly which configuration block to modify and what the secure configuration should look like. This required understanding not just the security implications but also the developer experience of fixing issues.

Error messages needed to be informative without being overwhelming. Stack traces are useful for debugging the tool itself, but developers scanning their infrastructure code need clear, actionable feedback about security issues.

Lessons Learned and What’s Next

Building PolicyGuard made me understand several insights about security tooling and Go development. The importance of good abstractions became clear when adding support for different IaC formats and output types. Interfaces in Go really shine when you need to support multiple implementations of similar functionality.

The value of comprehensive testing in security tools cannot be overstated. False positives undermine trust in automated security scanning. False negatives defeat the entire purpose. Getting the balance right requires extensive testing with real-world configurations.

Looking ahead, the Kubernetes support is next on the roadmap. Rather than building another custom parser, I’m exploring integration with existing Kubernetes MCP servers that already understand cluster configurations and security best practices. This approach could provide better coverage while reducing the development effort.

The Azure and GCP support will likely follow similar patterns to the AWS implementation, but each cloud provider has unique services and security models that will require careful consideration.

Building a security scanner from scratch was more educational than any course or tutorial could have been. Understanding how HCL parsing works, how policy engines evaluate rules, and how security violations should be presented to developers provided insights that are immediately applicable to other security tooling projects.

The Go ecosystem proved to be excellent for this type of system tool. The standard library handled most of the heavy lifting, and the third-party libraries filled gaps without creating dependency issues.

For anyone considering similar projects, my advice would be to start with the parsing and policy evaluation core, get that solid with proper testing, and then build the user experience and integrations around it. Security tooling is only as good as its adoption, and adoption depends heavily on developer experience.

The source code for PolicyGuard is available on GitHub for anyone interested in diving deeper into the implementation details, and I hope others can build on this foundation to make cloud security more accessible and effective.

Iac terraform OPA GO