Despite the shared responsibility model—where development teams are accountable for their own stored data—streamlining this process can lead to a win-win scenario for everyone involved. Enter Amazon Macie, a machine learning-powered service that automatically identifies and labels sensitive data in S3 buckets. This innovative tool promises great outcomes but comes with a cost: $0.10 per bucket per month and $1.00 per scanned GB. For large organizations, these expenses can add up quickly.
Thankfully, the self-service Amazon Macie solution provides an easy-to-use setup, targeting only the buckets that hold sensitive data, making it a practical and efficient option for managing data security in a complex AWS environment. Moreover, by focusing on buckets that are likely to contain sensitive data, organizations can minimize costs while maximizing the benefits of automated data classification. This targeted approach not only enhances security but also empowers development teams to maintain compliance without significant overhead.
In this post, we'll explain why the ability to automatically identify where and what kind of sensitive information is stored is crucial—not only for company policies and GDPR compliance but also for maintaining trust with individuals whose information is stored. We’ll cover how to leverage Amazon Macie for this purpose and provide strategies for effectively managing sensitive data. This capability facilitates the easy deletion of files and enables tracking as they are processed, ensuring robust data governance and security.
Amazon Macie is a fully managed service you can use in AWS that can find and label sensitive information based on machine learning. While the service primarily functions with S3 buckets, it can also be used indirectly for DynamoDB for example by uploading data to an S3 bucket and then scanning it.
The primary reason for using an Amazon Macie self-service is to achieve central management, which is particularly advantageous for larger environments with multiple workload accounts and development teams. This centralization streamlines operations and enhances scalability, making it easier for teams to use the service without needing extensive setup or maintenance knowledge.
The first step, central management involves configuring a delegated Macie administrator within the AWS organization. This administrator is granted permissions to centrally manage other Macie instances, typically within a designated Landing Zone account.
Little snipped of a part of the custom resource that configures the delegated Macie administrator:
The second step involves gathering events from workload accounts, which is essential for activating Macie in the workload account and scanning the correct S3 buckets. This process is facilitated through the event bus service, which should be deployed in all workload accounts. Achieving this can be streamlined by using stack sets and organizational units containing the workload accounts.
The events we aim to gather are specifically those generated by the 'PutBucketTagging' and 'DeleteBucket' API calls. The 'PutBucketTagging' event is created when a new S3 bucket is deployed with a tag, while the 'DeleteBucket' event indicates the potential deactivation of Macie within an account, as no 'PutBucketTagging' event is generated in such cases.
Snippet for example of an event bus pattern:
The events are captured using event patterns and must be forwarded to the event bus of the delegated Macie administrator account. The event bus of that account will then receive and process the events.
Snippet for example of an event bus:
The third step involves catching and processing the events, which in turn triggers the activation and scanning of specific S3 buckets in the workload account(s). This is achieved using an event rule designed to capture PutBucketTagging and DeleteBucket events. When the rule captures these events, it triggers a Lambda Function responsible for processing the event.
Snippet for example of an event rule:
The fourth step entails processing the events and activating the scanning of S3 buckets with specific tags. Within the Lambda function triggered by the event rule, several steps are taken to streamline the process effectively.
Snippet of the Lambda handler you can refer to:
First Check which events the lambda triggered. PutBucketTagging OR DeleteBucket:
Second check, macie_enabled_check, this definition checks if the PutBucketTagging events holds and tag “Macie” that holds the value True OR False and return those values:
The third check involves verifying if the workload account already has an active Macie instance. This is crucial because while you can remove a Macie tag from an S3 bucket, there might still be another S3 bucket in that account with the tag. Therefore, you want to avoid deactivating Macie in such workload accounts:
If a Macie tag is detected and Macie is found to be deactivated or suspended, the workload account will be designated as a Macie member. The creation of the Macie member proceeds as follows:
Now that the workload account is a Macie member, Macie is automatically activated in that workload account. But there is not S3 bucket scan yet, scanning the S3 bucket will use a daily job that only scans the buckets with a macie:true tag:
Once Macie is enabled in the workload account, is a member of the Macie delegated organization and conducts daily scans on the bucket tagged with 'macie:true'. The findings of the scans and bucket status can be displayed in Security Hub:
Next is the deactivation of Macie members when an S3 bucket is deleted, or a Macie tag is removed or set to False.
Similar to the activation step, the first check in the deletion process is to verify if the workload account is already a Macie member. If it is, the process proceeds to check_other_macie_enabled_buckets. This step ensures there are no other S3 buckets with the 'macie:true' tag before suspending the Macie member:
If there are no other S3 buckets with the 'macie:true' tag, the Macie service in the workload account can be suspended suspend_macie_member. This means that while the account retains existing Macie results, no further costs are incurred as the service ceases to run in the account:
The steps I have outlined represent the foundational aspects of solving the event-driven puzzle. It is crucial to remember that beyond these steps, setting up roles with the correct policies is essential, not only within a single account but also across multiple accounts with assumed roles. Furthermore, the logic within the Lambda Function is built entirely on API calls via Boto3. For insights into this approach, refer to the Boto3 documentation for a detailed understanding of its capabilities and usage: https://boto3.amazonaws.com/v1/documentation/api/latest/index.html
This was just an example of a self-service solution for Amazon Macie, but you can imagine that this kind of self-service solution can be built for other services as well and not just for AWS but also for third party solutions from a central Lambda Function.