Intent driven

In this chapter we show how to refactor to remove our test’s technical debt:

Our policy is very tied to CloudFormation
It’s not robust evidence, we are not checking the contents of the tag
It’s a compliance checkbox “did you do it?”, no “did you need to?”

Our first task is to remove the direct linkage to CloudFormation in our policy.

A higher vista

Let’s take a higher vista, and look at architecture at a higher level.

The WA2 Framework provides the core namespace which provides these key elements:

// Architectural node types
enum Node { Store, Run, Move }

struct Workload {
   nodes: Node[]
}

struct Evidence {
   value: String
}

The foundations of the core namespace (and indeed WA2) are these:

We reason about Nodes, which has three possible variations:
- Store data
- Run code
- Move information
We arrange a set of Node in our graph into a Workload
We use Evidence to enrich the graph

Projecting into our vista

As we saw in the previous chapter, the intent language allows us to write queries at an AWS CloudFormation level:
query(aws:cfn:Resource)

This is critical to be able to create evidence at a Vendor level, but we want to reason about architecture, not implementation.

The WA2 Framework provides the aws:cfn namespace which projects from CloudFormation into the core:Node type. So for example in this snippet we can see how it maps aws:type into Node:Store

derive stores {
	let cfn_stores = query(aws:cfn:Resource[aws:type in (
		"AWS::S3::Bucket",
		"AWS::EC2::Volume",
		"AWS::EFS::FileSystem"
		⋮
	)])

	for s in cfn_stores {
		let node = add(_, wa2:type, core:Store)
		add(node, core:source, s)
		add(core:workload, wa2:contains, node)
	}
}

This means that if you add

use core
use aws:cfn

to your wa2 intent file, you automatically get these projections. This allows us to rewrite our policy rule without reference to AWS.

Now that we can work at a higher level, we can write policy that is vendor neutral. In the last chapter we were checking all CloudFormation Resources for data classification, which makes no sense for a AWS IP Address (for example). Now we can start with quering only stores:

// we require everything is given a classification
policy require_classification {
	must all_stores_must_be_classified
}

// we need to know which cfn rx are critical
rule all_stores_must_be_classified {
	let scope = query(core:Store)

	for store in scope {
		// reference the source of this store (will be a cfn resource)
		let source = query(store/core:source)

		must query(store/core:Evidence/data:Criticality) {
			subject: source,
			area: data:Criticality,
			message: "Stores need to have criticality classification"
		}
	}
}

We use core:source to refer back to the source of the Store - in a CloudFormation based workload, that will be the Resource. Also note how we are now using core:Evidence to standardize where we keep evidence facts.

So we derive the evidence from the CloudFormation level, and can build a rule ontop of the evidence, not the CloudFormation implementation detail.

// a derive creates derived information
derive evidence_of_criticality_from_cfn_rx_tagging {
	let stores = query(core:Store[core:source/aws:cfn:Resource])

	for store in stores {
		let source = query(store/core:source)
		let dc_tag = query(source/aws:Tags/*[aws:Key = "DataCriticality"])

		should dc_tag {
			subject: source,
			area: data:Criticality,
			message: "Add a DataCriticality tag to this Resource"
		}

		let evidence = add(_, wa2:type, core:Evidence)
		add(store, wa2:contains, evidence)
		let fact = add(_, wa2:type, data:Criticality)
		add(evidence, wa2:contains, fact)
	}
}

Note again that we place facts under core:Evidence to meet our rule expectations. Instead of using an if statement to check the exist of the tag, we now use a should modal. The should (like the must) will stop the derive execution, preventing evidence from being added, but instead of a fatal error, it will be a warning.

Tip

Using a should in a derive provides guidance to an engineer that is relevant at the implementation level. The rule will signal a fatal architectural error about the lack of classification, but the derive can tell the engineer what needs to be fixed at the CloudFormation level.

Ensure all tests continue to pass

So now we can run again to ensure our refactoring has not broken anything:
Let’s check the target again:

intent check --profile example --target tagged.yaml --entry unvendor.wa2

PREPARE
-------
✓ Read target tagged.yaml
• Schedule CloudFormation validation
   Validation will run concurrently and report after results.
✓ Initialise kernel
✓ Parse intent entry unvendor.wa2
✓ Select profile example
✓ Run analysis

RESULTS
-------
✓ Profile: example [1/1]

VALIDATION
----------
✓ Validate CloudFormation against specification

So we have fixed our first piece of debt, having policy to tied to implementation detail. Now as WA2 adds new ways to ingest targets (API etc), and new vendors (Azure, GCP) we won’t have to change our polict, we will just add new derives to gather the evidence we need.

Enforcing a taxonomy

Currently the tags against a Resource could contain any value. So we want to make sure they follow our Data Classification Taxonomy. Everyone has their own, so lets define ours and then make sure its being used.

So we can add a enum that lists all possible values, just like core did for Node.

enum DataCriticality {
	Disposable,
	NonCritical,
	Important,
	BusinessCritical,
	MissionCritical
}

so we can write should query, with the as() function to convert the Value of the AWS Tag into our enum DataCriticality

// a derive creates derived information
derive evidence_of_criticality_from_cfn_rx_tagging {
	let stores = query(core:Store[core:source/aws:cfn:Resource])

	for store in stores {
		let source = query(store/core:source)
		let dc_tag = query(source/aws:Tags/*[aws:Key = "DataCriticality"])

		// is there a dc tag?
		should dc_tag {
			subject: source,
			area: DataCriticality,
			message: "Add a DataCriticality tag (aws:Tags/aws:Key = 'DataCriticality') to this Resource"
		}
		
		// is the dc tag value valid in taxonomy?
		should query(dc_tag/aws:Value) as(DataCriticality) {
			subject: source,
			area: DataCriticality,
			message: "DataCriticality tag must be a value from DataCriticality taxonomy"
		}

		let evidence = add(_, wa2:type, core:Evidence)
		add(store, wa2:contains, evidence)
		let fact = add(_, wa2:type, data:Criticality)
		add(evidence, wa2:contains, fact)
	}
}

Now we only derive evidence of Criticality if the tagging follows our taxonomy. In theory this also allows different projects to use different taxonomy, and our polict would still work.

Note

the [modal] [value] as([name]) syntax is truthy.
For our example if the value is not in the list of valid values in name, it evaluates to false. so since we used should a non-valid value stops us adding evidence

Ensure all tests continue to pass

Let’s check the target again:

intent check --profile example --target tagged.yaml --entry taxonomy.wa2

PREPARE
-------
✓ Read target tagged.yaml
• Schedule CloudFormation validation
   Validation will run concurrently and report after results.
✓ Initialise kernel
✓ Parse intent entry taxonomy.wa2
✓ Select profile example
✓ Run analysis

RESULTS
-------
✓ Profile: example [1/1]

VALIDATION
----------
✓ Validate CloudFormation against specification

Acting on Intent

So now we can step away from broad compliance tickboxes, and instead use our intent to decide what must be done. First we need another rule in our policy set:

// protect critical data, which we know through classification
policy protect_stores_based_on_classification {
	must all_stores_must_be_classified
	must ensure_critical_stores_are_protected
}

Critical stores should be resilient

We write the new rule that says that all critical stores must be resilient:

rule ensure_critical_stores_are_protected {
	let scope = query(core:Store[core:Evidence/data:isCritical])

	for store in scope {
		let source = query(store/core:source)
		must query(store/core:Evidence/data:isResilient) {
			subject: source,
			area: data:isResilient,
			message: "Critical stores need to be protected from loss"
		}
	}
}

Identify which stores are Critical

We are going to add to our tagging logic to identify if a store is critical or not based on our taxonomy:

// a derive creates derived information
derive evidence_of_criticality_from_cfn_rx_tagging {
	let stores = query(core:Store[core:source/aws:cfn:Resource])

	for store in stores {
		let source = query(store/core:source)
		let dc_tag = query(source/aws:Tags/*[aws:Key = "DataCriticality"])

		// is there a dc tag?
		should dc_tag {
			subject: source,
			area: DataCriticality,
			message: "Add a DataCriticality tag (aws:Tags/aws:Key = 'DataCriticality') to this Resource"
		}
		
		// is the dc tag value valid in taxonomy?
		let criticality = query(dc_tag/aws:Value) as(DataCriticality)
		should criticality {
			subject: source,
			area: DataCriticality,
			message: "DataCriticality tag must be a value from DataCriticality taxonomy"
		}

		let evidence = add(_, wa2:type, core:Evidence)
		add(store, wa2:contains, evidence)
		let fact = add(_, wa2:type, data:Criticality)
		add(evidence, wa2:contains, fact)

		// do we consider it critical? non-named are assumed critical
		let is_critical = match criticality {
			Disposable, NonCritical, Important, => false,
			else => true 
		}

		// mark it critical
		if is_critical {
			let crit_fact = add(_, wa2:type, data:isCritical)
			add(evidence, wa2:contains, crit_fact)
		}
	}
}

Tip

we use the match keyword to return different values based on the enum.
Note how we flipped the logic, so that when we add a new value to the enum in the future, the rule we defensively protect us by assuming it is critical.

Gather evidence from implementation

Finally we gather evidence of resilience, in this example we just look for S3 buckets with replication setup:

derive store_resilience_from_s3_replication {
   // https://docs.aws.amazon.com/AmazonS3/latest/userguide/replication-requirements.html
   let replicated_stores = query(aws:cfn:Resource[aws:type = "AWS::S3::Bucket"][
		aws:VersioningConfiguration/aws:Status = "Enabled"
	][
		aws:ReplicationConfiguration/aws:Role
	][
		aws:ReplicationConfiguration/aws:Rules/*/aws:Status = "Enabled"
	][
		aws:VersioningConfiguration/aws:Status = "Enabled"
	]/core:Store)

	for store in replicated_stores {
		let evidence = add(_, wa2:type, core:Evidence)
		add(store, wa2:contains, evidence)

		let fact = add(_, wa2:type, data:isResilient)
		add(evidence, wa2:contains, fact)
	}
}

We need to update our target to make this critical for our example:

AWSTemplateFormatVersion: "2010-09-09"

Resources:
  DataBucket:
    Type: AWS::S3::Bucket
    Properties:
      Tags:
        - Key: DataCriticality
          Value: MissionCritical

Let’s check the target again:

intent check --profile example --target protect.yaml --entry protect.wa2

PREPARE
-------
✓ Read target protect.yaml
• Schedule CloudFormation validation
   Validation will run concurrently and report after results.
✓ Initialise kernel
✓ Parse intent entry protect.wa2
✓ Select profile example
✓ Run analysis

RESULTS
-------
✗ Profile: example [0/1]
   └─ ✗ Policy: protect:protect_stores_based_on_classification [1/2]
      └─ ✗ must protect:ensure_critical_stores_are_protected (1 finding)
         └─ ✗ DataBucket
            Location: protect.yaml: line 4
            Area: data:isResilient
            Message: Critical stores need to be protected from loss

VALIDATION
----------
✓ Validate CloudFormation against specification

So the result is telling use that there are critical stores that should be resilient but are not. That would be a very expensive mistake to make in production.

We need to update our target to make this store resilient. Getting this right is not simple (and in this example is not complete!). So this would be ideal to put in your standard goverance (more later on this) set of derives:

AWSTemplateFormatVersion: "2010-09-09"

Parameters:
  DataBucketName:
    Type: String
  DestinationBucketArn:
    Type: String
  DestinationAccountId:
    Type: String
  ReplicationRoleName:
    Type: String
    Default: s3-replication-role

Resources:
  # IAM role assumed by S3 to perform cross-account replication
  # kept minimal and service-scoped to avoid broader IAM surface
  ReplicationRole:
    Type: AWS::IAM::Role
    Properties:
      RoleName: !Ref ReplicationRoleName

      # allow the S3 service to assume this role
      # no human or workload access
      AssumeRolePolicyDocument:
        Version: "2012-10-17"
        Statement:
          - Effect: Allow
            Principal:
              Service: s3.amazonaws.com
            Action: sts:AssumeRole

  # Managed policy attached to the replication role
  # split out to avoid inline IAM policies (guard requirement)
  # permissions are tightly scoped to:
  #  - read versioned data from the source bucket
  #  - write replicated objects + deletes to the destination bucket
  ReplicationPolicy:
    Type: AWS::IAM::ManagedPolicy
    Properties:
      ManagedPolicyName: !Sub "${AWS::StackName}-s3-replication"

      # attached only to the replication role
      Roles:
        - !Ref ReplicationRole

      PolicyDocument:
        Version: "2012-10-17"
        Statement:
          # allow S3 to read replication config and list source bucket
          - Effect: Allow
            Action:
              - s3:GetReplicationConfiguration
              - s3:ListBucket
            Resource: !Sub "arn:${AWS::Partition}:s3:::${DataBucketName}"

          # allow S3 to read all required object metadata + versions
          # needed for correct replication of versioned + protected objects
          - Effect: Allow
            Action:
              - s3:GetObjectVersionForReplication
              - s3:GetObjectVersionAcl
              - s3:GetObjectVersionTagging
              - s3:GetObjectRetention
              - s3:GetObjectLegalHold
            Resource: !Sub "arn:${AWS::Partition}:s3:::${DataBucketName}/*"

          # allow S3 to write replicated objects, deletes, and tags
          # into the destination account bucket
          - Effect: Allow
            Action:
              - s3:ReplicateObject
              - s3:ReplicateDelete
              - s3:ReplicateTags
              - s3:ObjectOwnerOverrideToBucketOwner
            Resource: !Sub "${DestinationBucketArn}/*"

  DataBucket:
    Type: AWS::S3::Bucket

    Properties:
      BucketName: !Ref DataBucketName

      # provide native undo for delete/overwrites, but ^cost
      VersioningConfiguration:
        Status: Enabled

      # replicate the data bucket to another account
      ReplicationConfiguration:
        Role: !GetAtt ReplicationRole.Arn
        Rules:
          - Id: ReplicateAllToBackupAccount
            Status: Enabled
            DeleteMarkerReplication:
              Status: Enabled
            Destination:
              Bucket: !Ref DestinationBucketArn
              Account: !Ref DestinationAccountId
              AccessControlTranslation:
                Owner: Destination

      Tags:
        - Key: DataCriticality
          Value: MissionCritical # major impact if we lose

Results

Now when we check the target, we see our intent is satisified:

intent check --profile example --target resilient.yaml --entry protect.wa2 --verbose

PREPARE
-------
✓ Read target resilient.yaml
• Schedule CloudFormation validation
   Validation will run concurrently and report after results.
✓ Initialise kernel
✓ Parse intent entry protect.wa2
✓ Select profile example
✓ Run analysis

RESULTS
-------
✓ Profile: example [1/1]
   └─ ✓ Policy: protect:protect_stores_based_on_classification [2/2]
      ├─ ✓ must protect:all_stores_must_be_classified
      └─ ✓ must protect:ensure_critical_stores_are_protected

VALIDATION
----------
✓ Validate CloudFormation against specification

Tip

We used the --verbose flag to show whats been evaluated in this check

We now have intent code:

// protect critical data, which we know through classification
policy protect_stores_based_on_classification {
	must all_stores_must_be_classified
	must ensure_critical_stores_are_protected
}

creating a policy that checks:

are data stores classified accoring to our criticality taxonomy?
Are your critical stores protected from data loss?

wit the benefits of:

without writing policy against a vendor specific implementation
having overly broad sweeping compliance requirements that are overkill
noisy false alarms for resources that don’t need that level of protection
losing sight of the architectural policy we are trying to encourage
written in one small language, not a polygot of json, yaml, python etc

We have ~115 lines of intent code, but most of this would be standard across any target system you built, and later we will show how you can package up common elements into your own namespace.

But first, lets bring this capacbility into the home of engineers, our IDE.

Well-Architected 2

Intent driven

A higher vista

Projecting into our vista

Policy independent of vendor

Ensure all tests continue to pass

Enforcing a taxonomy

Ensure all tests continue to pass

Acting on Intent

Critical stores should be resilient

Identify which stores are Critical

Gather evidence from implementation

Results

Keyboard shortcuts

Well-Architected 2