Accelerating AI/ML development at BMW Group with Amazon SageMaker Studio

This post is co-written with Marc Neumann, Amor Steinberg and Marinus Krommenhoek from BMW Group. The BMW Group – headquartered in Munich, Germany – is driven by 149,000 employees worldwide and manufactures in over 30 production and assembly facilities across 15 countries. Today, the BMW Group is the world’s leading manufacturer of premium automobiles and motorcycles, and provider of premium financial and mobility services. The BMW Group sets trends in production technology and sustainability as an innovation leader with an intelligent material mix, a technological shift towards digitalization, and resource-efficient production. In an increasingly digital and rapidly changing world, BMW Group’s business and product development strategies rely heavily on data-driven decision-making. With that, the need for data scientists and machine learning (ML) engineers has grown significantly. These skilled professionals are tasked with building and deploying models that improve the quality and efficiency of BMW’s business processes and enable informed leadership decisions. Data scientists and ML engineers require capable tooling and sufficient compute for their work. Therefore, BMW established a centralized ML/deep learning infrastructure on premises several years ago and continuously upgraded it. To pave the way for the growth of AI, BMW Group needed to make a leap regarding scalability and elasticity while reducing operational overhead, software licensing, and hardware management. In this post, we will talk about how BMW Group, in collaboration with AWS Professional Services, built its Jupyter Managed (JuMa) service to address these challenges. JuMa is a service of BMW Group’s AI platform for its data analysts, ML engineers, and data scientists that provides a user-friendly workspace with an integrated development environment (IDE). It is powered by Amazon SageMaker Studio and provides JupyterLab for Python and Posit Workbench for R. This offering enables BMW ML engineers to perform code-centric data analytics and ML, increases developer productivity by providing self-service capability and infrastructure automation, and tightly integrates with BMW’s centralized IT tooling landscape. JuMa is now available to all data scientists, ML engineers, and data analysts at BMW Group. The service streamlines ML development and production workflows (MLOps) across BMW by providing a cost-efficient and scalable development environment that facilitates seamless collaboration between data science and engineering teams worldwide. This results in faster experimentation and shorter idea validation cycles. Moreover, the JuMa infrastructure, which is based on AWS serverless and managed services, helps reduce operational overhead for DevOps teams and allows them to focus on enabling use cases and accelerating AI innovation at BMW Group. Challenges of growing an on-premises AI platform Prior to introducing the JuMa service, BMW teams worldwide were using two on-premises platforms that provided teams JupyterHub and RStudio environments. These platforms were too limited regarding CPU, GPU, and memory to allow the scalability of AI at BMW Group. Scaling these platforms with managing more on-premises hardware, more software licenses, and support fees would require significant up-front investments and high efforts for its maintenance. To add to this, limited self-service capabilities were available, requiring high operational effort for its DevOps teams. More importantly, the use of these platforms was misaligned with BMW Group’s IT cloud-first strategy. For example, teams using these platforms missed an easy migration of their AI/ML prototypes to the industrialization of the solution running on AWS. In contrast, the data science and analytics teams already using AWS directly for experimentation needed to also take care of building and operating their AWS infrastructure while ensuring compliance with BMW Group’s internal policies, local laws, and regulations. This included a range of configuration and governance activities from ordering AWS accounts, limiting internet access, using allowed listed packages to keeping their Docker images up to date. Overview of solution JuMa is a fully managed multi-tenant, security hardened AI platform service built on AWS with SageMaker Studio at the core. By relying on AWS serverless and managed services as the main building blocks of the infrastructure, the JuMa DevOps team doesn’t need to worry about patching servers, upgrading storage, or managing any other infrastructure components. The service handles all those processes automatically, providing a powerful technical platform that is generally up to date and ready to use. JuMa users can effortlessly order a workspace via a self-service portal to create a secure and isolated development and experimentation environment for their teams. After a JuMa workspace is provisioned, the users can launch JupyterLab or Posit workbench environments in SageMaker Studio with just a few clicks and start the development immediately, using the tools and frameworks they are most familiar with. JuMa is tightly integrated with a range of BMW Central IT services, including identity and access management, roles and rights management, BMW Cloud Data Hub (BMW’s data lake on AWS) and on-premises databases. The latter helps AI/ML teams seamlessly access required data, given they are authorized to do so, without needing to build data pipelines. Furthermore, the notebooks can be integrated into the corporate Git repositories to collaborate using version control. The solution abstracts away all technical complexities associated with AWS account management, configuration, and customization for AI/ML teams, allowing them to fully focus on AI innovation. The platform ensures that the workspace configuration meets BMW’s security and compliance requirements out of the box. The following diagram describes the high-level context view of the architecture. User journey BMW AI/ML team members can order their JuMa workspace using BMW’s standard catalog service. After approval by the line manager, the ordered JuMa workspace is provisioned by the platform fully automatedly. The workspace provisioning workflow includes the following steps (as numbered in the architecture diagram). 1. A data scientist team orders a new JuMa workspace in BMW’s Catalog. 2. JuMa automatically provisions a new AWS account for the workspace. This ensures full isolation between the workspaces following the federated model account structure mentioned in SageMaker Studio Administration Best Practices. 3. JuMa configures a workspace (which is a Sagemaker domain) that only allows predefined Amazon SageMaker features required for experimentation and development, specific custom kernels, and lifecycle configurations. It also sets up the required subnets and security groups that ensure the notebooks run in a secure environment. 4. After the workspaces are provisioned, the authorized users log in to the JuMa portal and access the SageMaker Studio IDE within their workspace using a SageMaker pre-signed URL. Users can choose between opening a SageMaker Studio private space or a shared space. Shared spaces encourage collaboration between different members of a team that can work in parallel on the same notebooks, whereas private spaces allow for a development environment for solitary workloads. 5. Using the BMW data portal, users can request access to on-premises databases or data stored in BMW’s Cloud Data Hub, making it available in their workspace for development and experimentation, from data preparation and analysis to model training and validation. 6. After an AI model is developed and validated in JuMa, AI teams can use the MLOPs service of the BMW AI platform to deploy it to production quickly and effortlessly. This service provides users with a production-grade ML infrastructure and pipelines on AWS using SageMaker, which can be set up in minutes with just a few clicks. Users simply need to host their model on the provisioned infrastructure and customize the pipeline to meet their specific use case needs. In this way, the AI platform covers the entire AI lifecycle at BMW Group. JuMa features Following best practice architecting on AWS, the JuMa service was designed and implemented according to the AWS Well-Architected Framework. Architectural decisions of each Well-Architected pillar are described in detail in the following sections. Security and compliance To assure full isolation between the tenants, each workspace receives its own AWS account, where the authorized users can jointly collaborate on analytics tasks as well as on developing and experimenting with AI/ML models. The JuMa portal itself enforces isolation at runtime using policy-based isolation with AWS Identity and Access Management (IAM) and the JuMa user’s context. For more information about this strategy, refer to Run-time, policy-based isolation with IAM. Data scientists can only access their domain through the BMW network via pre-signed URLs generated by the portal. Direct internet access is disabled within their domain. Their Sagemaker domain privileges are built using Amazon SageMaker Role Manager personas to ensure least privilege access to AWS services needed for the development such as SageMaker, Amazon Athena, Amazon Simple Storage Service (Amazon S3), and AWS Glue. This role implements ML guardrails (such as those described in Governance and control), including…

Source link

No Result