Skip to main content

ADR-009 Use AWS Sagemaker for analytical tooling

Status

🤔 Proposed

Context

Our user want analytical features not available on our existing platform. The types of tools and underlying compute is changing rapidly. SageMaker provides a managed service for these tools and provides instances with higher resources and GPU to speed and aid research.

There is also a lot of interest in using LLMs for e.g. semantic search of free text. SageMaker in VPC isolation mode makes sure sensitive workloads are secured and stay within the instance, we can further secure data using private VPC and PrivateLink

Decision

Proposal General Consequences

  • SageMaker costs are based on usage and can vary significantly month-to-month depending on your application’s usage, instance type. We can provide proactive notifications that will for the first time allow our users to understand the cost of the work that they are doing
  • Reduced operational cost and complexity
  • Agility and change readiness, additional analytical services can be offered when available without considerable effort that has lead to users having to work elsewhere (extensive development of front and backend services) e.g. Control Panel
  • Better cost transparency, we will understand our tooling compute costs which is currently very difficult to calculate
  • For Foundation Models, SageMaker JumpStart does not download models from a public model zoo, it can be used in fully locked-down e.g. no internet access
  • Network access can be limited and scoped down for SageMaker JumpStart models, this helps teams improve the security posture of the environment
  • Due to the VPC boundaries, access to the endpoint can also be limited via subnets and security groups, which adds an extra layer of security
  • Leverage managed services like SageMaker Studio
  • If successful we can close down our current tooling EKS. EKS although managed still requires a considerable amount of effort to run with the endless upgrades and get on with more useful tasks

Disadvantages

  • RStudio on Amazon SageMaker is a paid product and requires that each user is appropriately licensed. As part of the pilot we will need to understand our users need for RStudio
This page was last reviewed on 17 January 2024. It needs to be reviewed again on 17 July 2024 by the page owner #data-platform-notifications .
This page was set to be reviewed before 17 July 2024 by the page owner #data-platform-notifications. This might mean the content is out of date.