Intent driven
In this chapter we show how to refactor to remove our test’s technical debt:
- Our policy is very tied to CloudFormation
- It’s not robust evidence, we are not checking the contents of the tag
- It’s a compliance checkbox “did you do it?”, no “did you need to?”
Our first task is to remove the direct linkage to CloudFormation in our policy.
A higher vista
Let’s take a higher vista, and look at architecture at a higher level.
The WA2 Framework provides the core namespace which provides these key elements:
// Architectural node types
enum Node { Store, Run, Move }
struct Workload {
nodes: Node[]
}
struct Evidence {
value: String
}
The foundations of the core namespace (and indeed WA2) are these:
- We reason about
Nodes, which has three possible variations:StoredataRuncodeMoveinformation
- We arrange a set of
Nodein our graph into aWorkload - We use
Evidenceto enrich the graph
Projecting into our vista
As we saw in the previous chapter, the intent language allows us to write
queries at an AWS CloudFormation level:query(aws:cfn:Resource)
This is critical to be able to create evidence at a Vendor level, but we want to reason about architecture, not implementation.
The WA2 Framework provides the aws:cfn namespace which projects from
CloudFormation into the core:Node type. So for example in this snippet
we can see how it maps aws:type into Node:Store
derive stores {
let cfn_stores = query(aws:cfn:Resource[aws:type in (
"AWS::S3::Bucket",
"AWS::EC2::Volume",
"AWS::EFS::FileSystem"
⋮
)])
for s in cfn_stores {
let node = add(_, wa2:type, core:Store)
add(node, core:source, s)
add(core:workload, wa2:contains, node)
}
}
This means that if you add
use core
use aws:cfn
to your wa2 intent file, you automatically get these projections. This allows us to rewrite our policy rule without reference to AWS.
Policy independent of vendor
Now that we can work at a higher level, we can write policy that is vendor neutral. In the last chapter we were checking all CloudFormation Resources for data classification, which makes no sense for a AWS IP Address (for example). Now we can start with quering only stores:
// we require everything is given a classification
policy require_classification {
must all_stores_must_be_classified
}
// we need to know which cfn rx are critical
rule all_stores_must_be_classified {
let scope = query(core:Store)
for store in scope {
// reference the source of this store (will be a cfn resource)
let source = query(store/core:source)
must query(store/core:Evidence/data:Criticality) {
subject: source,
area: data:Criticality,
message: "Stores need to have criticality classification"
}
}
}
We use core:source to refer back to the
source of the Store - in a CloudFormation based workload, that will be
the Resource. Also note how we are now using core:Evidence to standardize where we
keep evidence facts.
So we derive the evidence from the CloudFormation level, and can build a rule
ontop of the evidence, not the CloudFormation implementation detail.
// a derive creates derived information
derive evidence_of_criticality_from_cfn_rx_tagging {
let stores = query(core:Store[core:source/aws:cfn:Resource])
for store in stores {
let source = query(store/core:source)
let dc_tag = query(source/aws:Tags/*[aws:Key = "DataCriticality"])
should dc_tag {
subject: source,
area: data:Criticality,
message: "Add a DataCriticality tag to this Resource"
}
let evidence = add(_, wa2:type, core:Evidence)
add(store, wa2:contains, evidence)
let fact = add(_, wa2:type, data:Criticality)
add(evidence, wa2:contains, fact)
}
}
Note again that we place facts under core:Evidence to meet our rule expectations.
Instead of using an if statement to check the exist of the tag, we now use a should modal.
The should (like the must) will stop the derive execution, preventing evidence from being added,
but instead of a fatal error, it will be a warning.
Tip
Using a
shouldin aderiveprovides guidance to an engineer that is relevant at the implementation level. Therulewill signal a fatal architectural error about the lack of classification, but thederivecan tell the engineer what needs to be fixed at the CloudFormation level.
Ensure all tests continue to pass
So now we can run again to ensure our refactoring has not broken anything:
Let’s check the target again:
intent check --profile example --target tagged.yaml --entry unvendor.wa2
PREPARE
-------
✓ Read target tagged.yaml
• Schedule CloudFormation validation
Validation will run concurrently and report after results.
✓ Initialise kernel
✓ Parse intent entry unvendor.wa2
✓ Select profile example
✓ Run analysis
RESULTS
-------
✓ Profile: example [1/1]
VALIDATION
----------
✓ Validate CloudFormation against specification
So we have fixed our first piece of debt, having policy to tied to implementation detail. Now as WA2 adds new ways to ingest targets (API etc), and new vendors (Azure, GCP) we won’t have to change our polict, we will just add new derives to gather the evidence we need.
Enforcing a taxonomy
Currently the tags against a Resource could contain any value. So we want to make sure they follow our Data Classification Taxonomy. Everyone has their own, so lets define ours and then make sure its being used.
So we can add a enum that lists all possible values, just like core did for Node.
enum DataCriticality {
Disposable,
NonCritical,
Important,
BusinessCritical,
MissionCritical
}
so we can write should query, with the as() function to convert the
Value of the AWS Tag into our enum DataCriticality
// a derive creates derived information
derive evidence_of_criticality_from_cfn_rx_tagging {
let stores = query(core:Store[core:source/aws:cfn:Resource])
for store in stores {
let source = query(store/core:source)
let dc_tag = query(source/aws:Tags/*[aws:Key = "DataCriticality"])
// is there a dc tag?
should dc_tag {
subject: source,
area: DataCriticality,
message: "Add a DataCriticality tag (aws:Tags/aws:Key = 'DataCriticality') to this Resource"
}
// is the dc tag value valid in taxonomy?
should query(dc_tag/aws:Value) as(DataCriticality) {
subject: source,
area: DataCriticality,
message: "DataCriticality tag must be a value from DataCriticality taxonomy"
}
let evidence = add(_, wa2:type, core:Evidence)
add(store, wa2:contains, evidence)
let fact = add(_, wa2:type, data:Criticality)
add(evidence, wa2:contains, fact)
}
}
Now we only derive evidence of Criticality if the tagging follows our taxonomy. In theory this also allows different projects to use different taxonomy, and our polict would still work.
Note
the
[modal] [value] as([name])syntax is truthy.
For our example if the value is not in the list of valid values in name, it evaluates to false. so since we usedshoulda non-valid value stops us adding evidence
Ensure all tests continue to pass
Let’s check the target again:
intent check --profile example --target tagged.yaml --entry taxonomy.wa2
PREPARE
-------
✓ Read target tagged.yaml
• Schedule CloudFormation validation
Validation will run concurrently and report after results.
✓ Initialise kernel
✓ Parse intent entry taxonomy.wa2
✓ Select profile example
✓ Run analysis
RESULTS
-------
✓ Profile: example [1/1]
VALIDATION
----------
✓ Validate CloudFormation against specification
Acting on Intent
So now we can step away from broad compliance tickboxes, and instead use our intent to decide what must be done. First we need another rule in our policy set:
// protect critical data, which we know through classification
policy protect_stores_based_on_classification {
must all_stores_must_be_classified
must ensure_critical_stores_are_protected
}
Critical stores should be resilient
We write the new rule that says that all critical stores must be resilient:
rule ensure_critical_stores_are_protected {
let scope = query(core:Store[core:Evidence/data:isCritical])
for store in scope {
let source = query(store/core:source)
must query(store/core:Evidence/data:isResilient) {
subject: source,
area: data:isResilient,
message: "Critical stores need to be protected from loss"
}
}
}
Identify which stores are Critical
We are going to add to our tagging logic to identify if a store is critical or not based on our taxonomy:
// a derive creates derived information
derive evidence_of_criticality_from_cfn_rx_tagging {
let stores = query(core:Store[core:source/aws:cfn:Resource])
for store in stores {
let source = query(store/core:source)
let dc_tag = query(source/aws:Tags/*[aws:Key = "DataCriticality"])
// is there a dc tag?
should dc_tag {
subject: source,
area: DataCriticality,
message: "Add a DataCriticality tag (aws:Tags/aws:Key = 'DataCriticality') to this Resource"
}
// is the dc tag value valid in taxonomy?
let criticality = query(dc_tag/aws:Value) as(DataCriticality)
should criticality {
subject: source,
area: DataCriticality,
message: "DataCriticality tag must be a value from DataCriticality taxonomy"
}
let evidence = add(_, wa2:type, core:Evidence)
add(store, wa2:contains, evidence)
let fact = add(_, wa2:type, data:Criticality)
add(evidence, wa2:contains, fact)
// do we consider it critical? non-named are assumed critical
let is_critical = match criticality {
Disposable, NonCritical, Important, => false,
else => true
}
// mark it critical
if is_critical {
let crit_fact = add(_, wa2:type, data:isCritical)
add(evidence, wa2:contains, crit_fact)
}
}
}
Tip
we use the
matchkeyword to return different values based on theenum.
Note how we flipped the logic, so that when we add a new value to the enum in the future, the rule we defensively protect us by assuming it is critical.
Gather evidence from implementation
Finally we gather evidence of resilience, in this example we just look for S3 buckets with replication setup:
derive store_resilience_from_s3_replication {
// https://docs.aws.amazon.com/AmazonS3/latest/userguide/replication-requirements.html
let replicated_stores = query(aws:cfn:Resource[aws:type = "AWS::S3::Bucket"][
aws:VersioningConfiguration/aws:Status = "Enabled"
][
aws:ReplicationConfiguration/aws:Role
][
aws:ReplicationConfiguration/aws:Rules/*/aws:Status = "Enabled"
][
aws:VersioningConfiguration/aws:Status = "Enabled"
]/core:Store)
for store in replicated_stores {
let evidence = add(_, wa2:type, core:Evidence)
add(store, wa2:contains, evidence)
let fact = add(_, wa2:type, data:isResilient)
add(evidence, wa2:contains, fact)
}
}
We need to update our target to make this critical for our example:
AWSTemplateFormatVersion: "2010-09-09"
Resources:
DataBucket:
Type: AWS::S3::Bucket
Properties:
Tags:
- Key: DataCriticality
Value: MissionCritical
Let’s check the target again:
intent check --profile example --target protect.yaml --entry protect.wa2
PREPARE
-------
✓ Read target protect.yaml
• Schedule CloudFormation validation
Validation will run concurrently and report after results.
✓ Initialise kernel
✓ Parse intent entry protect.wa2
✓ Select profile example
✓ Run analysis
RESULTS
-------
✗ Profile: example [0/1]
└─ ✗ Policy: protect:protect_stores_based_on_classification [1/2]
└─ ✗ must protect:ensure_critical_stores_are_protected (1 finding)
└─ ✗ DataBucket
Location: protect.yaml: line 4
Area: data:isResilient
Message: Critical stores need to be protected from loss
VALIDATION
----------
✓ Validate CloudFormation against specification
So the result is telling use that there are critical stores that should be resilient but are not. That would be a very expensive mistake to make in production.
We need to update our target to make this store resilient. Getting
this right is not simple (and in this example is not complete!).
So this would be ideal to put in your standard
goverance (more later on this) set of derives:
AWSTemplateFormatVersion: "2010-09-09"
Parameters:
DataBucketName:
Type: String
DestinationBucketArn:
Type: String
DestinationAccountId:
Type: String
ReplicationRoleName:
Type: String
Default: s3-replication-role
Resources:
# IAM role assumed by S3 to perform cross-account replication
# kept minimal and service-scoped to avoid broader IAM surface
ReplicationRole:
Type: AWS::IAM::Role
Properties:
RoleName: !Ref ReplicationRoleName
# allow the S3 service to assume this role
# no human or workload access
AssumeRolePolicyDocument:
Version: "2012-10-17"
Statement:
- Effect: Allow
Principal:
Service: s3.amazonaws.com
Action: sts:AssumeRole
# Managed policy attached to the replication role
# split out to avoid inline IAM policies (guard requirement)
# permissions are tightly scoped to:
# - read versioned data from the source bucket
# - write replicated objects + deletes to the destination bucket
ReplicationPolicy:
Type: AWS::IAM::ManagedPolicy
Properties:
ManagedPolicyName: !Sub "${AWS::StackName}-s3-replication"
# attached only to the replication role
Roles:
- !Ref ReplicationRole
PolicyDocument:
Version: "2012-10-17"
Statement:
# allow S3 to read replication config and list source bucket
- Effect: Allow
Action:
- s3:GetReplicationConfiguration
- s3:ListBucket
Resource: !Sub "arn:${AWS::Partition}:s3:::${DataBucketName}"
# allow S3 to read all required object metadata + versions
# needed for correct replication of versioned + protected objects
- Effect: Allow
Action:
- s3:GetObjectVersionForReplication
- s3:GetObjectVersionAcl
- s3:GetObjectVersionTagging
- s3:GetObjectRetention
- s3:GetObjectLegalHold
Resource: !Sub "arn:${AWS::Partition}:s3:::${DataBucketName}/*"
# allow S3 to write replicated objects, deletes, and tags
# into the destination account bucket
- Effect: Allow
Action:
- s3:ReplicateObject
- s3:ReplicateDelete
- s3:ReplicateTags
- s3:ObjectOwnerOverrideToBucketOwner
Resource: !Sub "${DestinationBucketArn}/*"
DataBucket:
Type: AWS::S3::Bucket
Properties:
BucketName: !Ref DataBucketName
# provide native undo for delete/overwrites, but ^cost
VersioningConfiguration:
Status: Enabled
# replicate the data bucket to another account
ReplicationConfiguration:
Role: !GetAtt ReplicationRole.Arn
Rules:
- Id: ReplicateAllToBackupAccount
Status: Enabled
DeleteMarkerReplication:
Status: Enabled
Destination:
Bucket: !Ref DestinationBucketArn
Account: !Ref DestinationAccountId
AccessControlTranslation:
Owner: Destination
Tags:
- Key: DataCriticality
Value: MissionCritical # major impact if we lose
Results
Now when we check the target, we see our intent is satisified:
intent check --profile example --target resilient.yaml --entry protect.wa2 --verbose
PREPARE
-------
✓ Read target resilient.yaml
• Schedule CloudFormation validation
Validation will run concurrently and report after results.
✓ Initialise kernel
✓ Parse intent entry protect.wa2
✓ Select profile example
✓ Run analysis
RESULTS
-------
✓ Profile: example [1/1]
└─ ✓ Policy: protect:protect_stores_based_on_classification [2/2]
├─ ✓ must protect:all_stores_must_be_classified
└─ ✓ must protect:ensure_critical_stores_are_protected
VALIDATION
----------
✓ Validate CloudFormation against specification
Tip
We used the
--verboseflag to show whats been evaluated in this check
We now have intent code:
// protect critical data, which we know through classification
policy protect_stores_based_on_classification {
must all_stores_must_be_classified
must ensure_critical_stores_are_protected
}
creating a policy that checks:
- are data stores classified accoring to our criticality taxonomy?
- Are your critical stores protected from data loss?
wit the benefits of:
- without writing policy against a vendor specific implementation
- having overly broad sweeping compliance requirements that are overkill
- noisy false alarms for resources that don’t need that level of protection
- losing sight of the architectural policy we are trying to encourage
- written in one small language, not a polygot of json, yaml, python etc
We have ~115 lines of intent code, but most of this would be standard across any target system you built, and later we will show how you can package up common elements into your own namespace.
But first, lets bring this capacbility into the home of engineers, our IDE.