Software Supply Chain Security on AWS: Keyless Signing, SBOMs, and SLSA Provenance

An image scanner answers one question: does this image contain known vulnerabilities right now. It does not answer the question that a supply chain attack actually exploits, which is where the image came from. A malicious image built by an attacker who compromised your CI, pushed under a legitimate tag, with a clean scan result, is still a clean scan result. The scanner was never looking at provenance.

That gap is not hypothetical. The CircleCI breach in 2023, the Codecov bash-uploader compromise in 2021, the SolarWinds build-system attack: all of them worked by getting malicious code into a trusted build or distribution path, not by shipping a CVE. A pipeline that only scans for vulnerabilities is blind to every one of them.

This project builds the part that closes that gap for a container image. GitHub Actions builds the image and pushes it to ECR. Amazon Inspector scans it. Syft produces an SBOM. Cosign signs the image with no private key anywhere. A SLSA provenance attestation records what built it. And before the image is considered deployable, a verification step checks the signature, the signing identity, and the provenance together. If any of that fails, the pipeline fails.

The Trust Model

The shift this project makes is easy to state: an image is not trusted because it is in your registry. It is trusted because it carries a signature proving which repository and workflow produced it, and that signature can be verified by anyone without a shared secret.

Keyless signing is how you get there without a key to manage or leak. When the workflow runs, it holds a short-lived OIDC identity token that encodes exactly what it is: repo:ToluGIT/aws-supply-chain-security:ref:refs/heads/main. Cosign presents that token to Sigstore’s Fulcio, which issues an X.509 certificate valid for ten minutes, bound to that identity. Cosign signs the image digest with the ephemeral key, records the signature in Rekor (Sigstore’s public append-only transparency log), and the certificate expires. There is no long-lived private key because there is no long-lived private key at all.

Verification then checks three things that all have to hold:

The signature is valid and came from a Fulcio-issued certificate, not a self-signed one.
The certificate identity matches the repository and workflow you expect, not some other project that also signs keyless.
A Rekor entry proves the signature existed at signing time.

The second check is the one most setups get wrong by omitting it. Keyless signing on its own only proves “some GitHub workflow signed this.” Pinning the expected identity is what turns that into “the workflow in my repo, on my branch, signed this.” I prove exactly that failure mode in the adversarial tests later.

Why Cosign keyless, not AWS Signer

AWS has its own signing service, so the obvious question is why reach for an OSS tool.

	Cosign keyless	AWS Signer (Notation)
Key management	None, ephemeral Fulcio certs	Managed signing profile and key
Signing identity	The workflow’s OIDC identity, pinned at verify	An AWS signing profile
Transparency log	Rekor, public, anyone can audit	None public
Verifiable by	Anyone, no AWS credentials	Requires AWS-side trust setup
Ecosystem	The de facto standard for OCI signing; Kubernetes admission controllers speak it	AWS-native, narrower support

I picked Cosign keyless. The signature is bound to the exact repo, workflow, and ref rather than to an AWS profile that any number of pipelines could share, and the Rekor log makes it independently auditable. AWS Signer is a fine choice in an all-AWS shop that wants managed keys and Notation, but for a container image the OSS keyless pattern is stronger on identity and more portable.

The AWS side of this is deliberately thin: one ECR repository, one S3 bucket for Inspector’s SBOM exports, one IAM role for the keyless identity. Everything interesting happens in the pipeline.

The AWS Bootstrap

Two ECR settings carry most of the weight, and both are set at repository creation. Tag immutability means an attacker who gets push access cannot overwrite a known-good tag with a malicious image of the same name. Scan-on-push hands every new image to Inspector automatically.

ECR repository scs-prod-ecr-app showing immutable tags and continuous scan-on-push

The IAM role is where the keyless-to-AWS half lives, and the entire security of it comes down to one line in the trust policy: the sub condition.

"Condition": {
  "StringEquals": { "token.actions.githubusercontent.com:aud": "sts.amazonaws.com" },
  "StringLike": { "token.actions.githubusercontent.com:sub": "repo:ToluGIT/aws-supply-chain-security:*" }
}

The GitHub OIDC token’s sub claim encodes the repository and ref that requested it, in the form repo:OWNER/NAME:ref:refs/heads/BRANCH. The StringLike on repo:ToluGIT/aws-supply-chain-security:* means only workflows in that one repository can assume this role; a workflow in any other repo presents a sub that does not match and AssumeRoleWithWebIdentity is denied. This is the control that stops the obvious lateral move: if another repo in the same GitHub org (or anyone’s repo anywhere) tries to assume the role using its own valid GitHub OIDC token, the sub mismatch blocks it. Getting this condition wrong, for example scoping it to repo:ToluGIT/* or omitting it, would let any repo the attacker controls assume the role. The trailing :* here is deliberately permissive on ref (any branch or tag in this repo can assume the role); tightening it to :ref:refs/heads/main would restrict it to the main branch, which is the stricter production choice.

IAM role trust policy federating GitHub OIDC, scoped with a sub condition to repo:ToluGIT/aws-supply-chain-security

There are no AWS access keys stored in GitHub anywhere. The workflow requests an OIDC token, calls AssumeRoleWithWebIdentity, and gets short-lived STS credentials for the run. This is the single most effective control against the most common CI breach: a long-lived key committed to a repo or left in a secrets store. There is nothing durable to leak.

SBOM exports go to a versioned, KMS-encrypted S3 bucket, and Inspector was already enabled for ECR scanning from an earlier project, so scan-on-push worked the moment the first image landed.

The Build, Scan, and Gate Pipeline

The workflow authenticates to AWS keyless, builds the image, tags it by commit (git-<sha>), pushes it, waits for Inspector, then gates on CVEs before anything is signed. The order matters: a vulnerable image must never reach the signing step.

Two things about the CVE gate turned out to be less obvious than expected.

First, ECR enhanced scanning never reports a status of COMPLETE. The basic-scanning model has a terminal COMPLETE status; enhanced scanning (which is Inspector) reports coverage status ACTIVE with the description “Continuous scan is selected” and keeps monitoring. My first gate polled for COMPLETE and would have waited forever. The fix is to poll Inspector’s coverage status for ACTIVE on the image, keyed on the digest, and then read findings.

Second, and more interesting: the “clean” image is not CVE-free, and gating on all CRITICALs would block every build. The python:3.12-slim base ships a perl package with a CRITICAL CVE that has no upstream fix. You cannot patch what has no patch. If the gate fails on any CRITICAL, it fails on that base-image CVE forever, regardless of how clean your own dependencies are. So the gate fires only on findings that have a fix available:

count_fixable() {
  aws inspector2 list-findings \
    --filter-criteria "{\"ecrImageHash\":[{\"comparison\":\"EQUALS\",\"value\":\"$DIGEST\"}],\"severity\":[{\"comparison\":\"EQUALS\",\"value\":\"$1\"}],\"fixAvailable\":[{\"comparison\":\"EQUALS\",\"value\":\"YES\"}]}" \
    --region "$AWS_REGION" --query "length(findings)" --output text
}
FIX_CRITICAL=$(count_fixable CRITICAL)
FIX_HIGH=$(count_fixable HIGH)

Gating on fixable findings is the defensible threshold: it blocks what you can act on and ignores what you cannot. The clean image has zero fixable CRITICAL or HIGH findings and passes; its only CRITICAL is the unpatchable base perl one.

After the gate, Syft generates a CycloneDX SBOM and uploads it as a build artifact.

GitHub Actions run showing a successful build with the CycloneDX SBOM produced as a hashed artifact

Signing, Provenance, and the Verify Gate

Once the image is past the CVE gate, the workflow installs Cosign and signs the image by digest. Signing the digest, never a tag, is deliberate: tags move, digests are content addresses. A signature bound to a tag proves nothing once the tag is repointed.

- name: Cosign keyless sign (by digest)
  run: cosign sign --yes "${ECR_URI}@${DIGEST}"

Keyless is the default in Cosign 2.x. The step generates ephemeral keys, gets a Fulcio certificate for the workflow’s OIDC identity, signs, and writes a transparency log entry.

Cosign keyless signing step: ephemeral keys, Fulcio certificate, Rekor tlog entry created

A SLSA provenance attestation follows, describing the build: the source repo, commit, workflow, and builder. I should be precise about the level here. This is SLSA Build L2: the provenance is generated in the same job that builds the image. Genuine L3 requires an isolated, non-forgeable builder (the slsa-github-generator reusable workflow runs provenance generation in a separate job the build cannot tamper with). L2 is the honest claim for a single-job cosign attest predicate, and overclaiming L3 would be exactly the kind of unverifiable provenance this project argues against.

Then the verification gate runs, in the same pipeline, and fails the run if anything does not check out. This is the payoff step. The verification confirms the signature is valid, the certificate subject is the exact workflow file in this repo on this ref, the issuer is GitHub’s OIDC, and the SLSA provenance attestation verifies against the same identity. The final log line is the whole point: the image is signed, attested, and verified, therefore deployable.

Verification gate: certificate subject is the exact workflow/repo/ref, issuer is GitHub OIDC, provenance verified, image declared deployable

The signature is not only in ECR. It is in Rekor, Sigstore’s public transparency log, queryable by anyone with the log index. This is what makes the signature independently auditable: you do not have to trust my pipeline’s word that it signed the image; you can look it up.

Rekor transparency log entry for the signature, queried by log index

Proving It: The Adversarial Tests

Having built the controls, the next step is to prove they work by attacking them. A pipeline that only ever shows green proves nothing; anyone can make a demo pass. What matters is running the attacks the controls are supposed to stop and watching each one fail closed.

A vulnerable image is blocked before it can be signed

The pipeline has a second app variant with dependencies pinned to old versions carrying real, recent CVEs: PyYAML 5.3.1 (CVE-2020-14343, arbitrary code execution), certifi 2022.12.7, Werkzeug 2.0.3, and others. Inspector flags them, and unlike the base-image perl CVE, these have fixes available.

Inspector findings for the vulnerable image: fixable and exploitable CRITICAL CVEs including PyYAML CVE-2020-14343

Running the pipeline against this variant, the CVE gate counts three fixable CRITICAL and seventeen fixable HIGH findings and exits non-zero. Every step after it, the SBOM, the Cosign signing, the provenance attestation, the verify gate, is skipped. The vulnerable image sits in ECR unsigned, which means no downstream verification can ever pass it.

Getting this gate to actually work took three tries, and the failures are more instructive than the success. The first version used a shell eval to build the finding counts and crashed with exit 127; the image was blocked, but by a shell error rather than a gate decision. The second version was worse: it failed open. inspector2 list-findings paginates, and length(findings) with text output prints one number per page joined by a newline, so the count came back as "3\n0". The numeric comparison [ "3\n0" -gt 0 ] threw “integer expression expected”, the if short-circuited to false, and the gate printed “passed” and signed a vulnerable image carrying three fixable CRITICAL CVEs. A gate that fails open is worse than no gate, because it produces a signed, apparently-verified artifact that is actually dangerous. Only re-running the adversarial test caught it: the log said “passed” while the counts printed right above it said three critical. The fix was to sum the paginated counts into a single integer with awk before comparing. The lesson is that a security control is not proven by writing it; it is proven by watching it reject the thing it is supposed to reject.

CVE gate step failing on fixable CRITICAL findings, with all signing and attestation steps skipped

A tampered image has no valid signature

The realistic tampering attack: an adversary with push access takes the legitimate image, adds a malicious layer, and pushes it under a plausible tag like v1.0-hotfix. The layer I used is an IMDS credential-exfiltration stub, the same primitive behind the TeamTNT cryptojacking campaigns and the Capital One breach: read the instance metadata credentials and beacon them out. The exfil endpoint is a placeholder, so it demonstrates the technique without stealing anything.

Because Cosign signs the digest, the tampered image has a different digest and there is no signature for it. cosign verify returns “no signatures found.” The malicious image is in the registry under a legitimate-looking tag, but it cannot be verified, so a deploy gate that runs cosign verify blocks it.

cosign verify returning "no signatures found" on the tampered tag, then passing on the genuine signed digest

To be precise about the property this proves: the protection is not that Cosign detected modified bytes inside a signed image (it cannot, and does not try). It is that the signature is bound to the original digest, so any change produces a new digest with no signature at all. The tampered image is rejected not because it was caught being altered, but because it was never signed.

A valid signature from the wrong identity is rejected

This is the sharpest result. Take the genuinely signed, legitimate image, and verify it while pinning the wrong expected repository. The signature is valid. The image is authentic. Verification still fails, because the certificate identity does not match:

Error: no matching signatures: none of the expected identities matched
what was in the certificate, got subjects
[https://github.com/ToluGIT/aws-supply-chain-security/.github/workflows/supply-chain.yml@refs/heads/main]

cosign verify rejecting a valid signature when the expected identity is wrong, then passing with the correct identity

The same image, the same signature, verified twice, differing only in the expected identity: one rejected, one passed. This proves the point from the trust model. Keyless signing without an identity pin reduces to “someone signed this,” which is nearly worthless. The --certificate-identity flags are the difference between that and “my workflow signed this.”

Where This Has Limits

The provenance is SLSA Build L2, not L3, as discussed. For a threat model that includes a compromised build job forging its own provenance, you need the isolated builder.

The CVE gate depends on Inspector having a fix-availability signal, which it does not always have immediately for the newest CVEs. A brand-new fixable CRITICAL with fixAvailable not yet populated would slip the gate until Inspector catches up. Gating additionally on an inspector score threshold would tighten this.

The most important limitation is where verification runs, and it deserves more than a footnote. This pipeline verifies the image inside itself, at build time, which proves the artifact is signed and attested at the moment it is produced. It does nothing to stop someone from deploying a different, unsigned image later. The signature only becomes a deploy-time control when something refuses to run an image that fails cosign verify at the point of deployment. On EKS that is an admission controller: Sigstore’s Policy Controller or a Kyverno verifyImages policy, either of which rejects a pod whose image is not signed by the expected identity, using the exact same --certificate-identity and issuer checks the pipeline uses. On ECS or Lambda the equivalent is a deploy-time gate in the release pipeline that runs cosign verify before the image is promoted. Without one of these, the signature is a claim nobody checks: it stops mattering the moment an attacker deploys around it. Building that admission-time enforcement is the natural next project, and it is where the signing work in this one actually pays off.