Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ


Choose your language

InfoQ Homepage News Amazon Announces the Improvement of ML Models to Better Identify Sensitive Data on Amazon Macie

Amazon Announces the Improvement of ML Models to Better Identify Sensitive Data on Amazon Macie

Amazon is announcing a new capability to create "allow lists" in Amazon Macie. Now, text or text patterns that are not desired for Macie to report as sensitive data can be specified in allow lists. Amazon Macie is a fully managed data security and data privacy service that uses machine learning and pattern matching to discover and protect sensitive data in AWS.

According to Amazon, when evaluating JSON data in Amazon S3 buckets, Macie has improved the machine learning models used by managed data identifiers to produce more precise and useful results. Extraction of additional information from surrounding fields in JSON data and JSON Lines files improves the machine learning models' accuracy even further. This enhancement also speeds up the processing of certain kinds of files, which will accelerate the completion of sensitive data-finding tasks.

Macie applies machine learning and pattern matching techniques to selected buckets to identify and alert about sensitive data, such as names, addresses, credit card numbers, or credential materials. Identifying sensitive data in S3 can help in compliance with regulations, such as the Health Insurance Portability and Accountability Act (HIPAA) and General Data Privacy Regulation (GDPR).

Once activated, Macie automatically compiles a complete S3 inventory at the bucket level and examines each bucket to detect public access, lack of encryption, sharing, or replication with AWS accounts outside of a customer's business. If Macie detects sensitive data or potential issues with security or privacy, it creates detailed findings to review and remediate as necessary.

Analysis of data promises to provide enormous insights for data scientists, business managers, and artificial intelligence algorithms. Governance and security must also ensure that the data conforms to the same data protection and monitoring requirements as any other part of the enterprise.

Tools for identifying such information are useful in the event of a ransomware attack to quickly identify what information could have been compromised and help understand the scope of potential security concerns and fallout. Following security recommendations, to further secure data, workloads, and applications, it’s also recommended to combine AWS Security Hub and Amazon GuardDuty with Amazon Macie. Other security tools to consider are OpenSSL, Let's Encrypt, and Ensighten.

About the Author

Rate this Article