Operationalize your Amazon SageMaker Studio notebooks as scheduled notebook jobs PlatoBlockchain Data Intelligence. Vertical Search. Ai.

Operaționalizați notebook-urile Amazon SageMaker Studio ca lucrări de notebook programate

Amazon SageMaker Studio provides a fully managed solution for data scientists to interactively build, train, and deploy machine learning (ML) models. In addition to the interactive ML experience, data workers also seek solutions to run notebooks as ephemeral jobs without the need to refactor code as Python modules or learn DevOps tools and best practices to automate their deployment infrastructure. Some common use cases for doing this include:

  • Se rulează regulat inferența modelului pentru a genera rapoarte
  • Scaling up a feature engineering step after having tested in Studio against a subset of data on a small instance
  • Reantrenarea și implementarea modelelor pe o anumită cadență
  • Analyzing your team’s Amazon SageMaker usage on a regular cadence

Previously, when data scientists wanted to take the code they built interactively on notebooks and run them as batch jobs, they were faced with a steep learning curve using Pipelines Amazon SageMaker, AWS Lambdas, Amazon EventBridge, Sau alte solutii that are difficult to set up, use, and manage.

cu Locuri de muncă pentru notebook-uri SageMaker, you can now run your notebooks as is or in a parameterized fashion with just a few simple clicks from the SageMaker Studio or SageMaker Studio Lab interface. You can run these notebooks on a schedule or immediately. There’s no need for the end-user to modify their existing notebook code. When the job is complete, you can view the populated notebook cells, including any visualizations!

In this post, we share how to operationalize your SageMaker Studio notebooks as scheduled notebook jobs.

Prezentare generală a soluțiilor

The following diagram illustrates our solution architecture. We utilize the pre-installed SageMaker extension to run notebooks as a job immediately or on a schedule.

In the following sections, we walk through the steps to create a notebook, parameterize cells, customize additional options, and schedule your job. We also include a sample use case.

Cerințe preliminare

To use SageMaker notebook jobs, you need to be running a JupyterLab 3 JupyterServer app within Studio. For more information on how to upgrade to JupyterLab 3, refer to View and update the JupyterLab version of an app from the console. Asigurați-vă că Shut down and Update SageMaker Studio in order to pick up the latest updates.

To define job definitions that run notebooks on a schedule, you may need to add additional permissions to your SageMaker execution role.

First, add a trust relationship to your SageMaker execution role that allows events.amazonaws.com to assume your role:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "Service": "sagemaker.amazonaws.com"
            },
            "Action": "sts:AssumeRole"
        },
        {
            "Effect": "Allow",
            "Principal": {
                "Service": "events.amazonaws.com"
            },
            "Action": "sts:AssumeRole"
        }
    ]
}

Additionally, you may need to create and attach an inline policy to your execution role. The below policy is supplementary to the very permissive AmazonSageMakerFullAccess policy. For a complete and minimal set of permissions see Install Policies and Permissions.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": [
                "events:TagResource",
                "events:DeleteRule",
                "events:PutTargets",
                "events:DescribeRule",
                "events:PutRule",
                "events:RemoveTargets",
                "events:DisableRule",
                "events:EnableRule"
            ],
            "Resource": "*",
            "Condition": {
              "StringEquals": {
                "aws:ResourceTag/sagemaker:is-scheduling-notebook-job": "true"
              }
            }
        },
        {
            "Sid": "VisualEditor1",
            "Effect": "Allow",
            "Action": "iam:PassRole",
            "Resource": "arn:aws:iam::*:role/*",
            "Condition": {
                "StringLike": {
                    "iam:PassedToService": "events.amazonaws.com"
                }
            }
        },
        {
            "Sid": "VisualEditor2",
            "Effect": "Allow",
            "Action": "sagemaker:ListTags",
            "Resource": "arn:aws:sagemaker:*:*:user-profile/*/*"
        }
    ]
}

Creați o lucrare de notebook

To operationalize your notebook as a SageMaker notebook job, choose the Creați o lucrare de notebook icon.

Operationalize your Amazon SageMaker Studio notebooks as scheduled notebook jobs PlatoBlockchain Data Intelligence. Vertical Search. Ai.

Alternatively, you can choose (right-click) your notebook on the file system and choose Creați job pentru notebook.

Operationalize your Amazon SageMaker Studio notebooks as scheduled notebook jobs PlatoBlockchain Data Intelligence. Vertical Search. Ai.

În Creați loc de muncă section, simply choose the right instance type for your scheduled job based on your workload: standard instances, compute optimized instances, or accelerated computing instances that contain GPUs. You can choose any of the instances available for SageMaker training jobs. For the complete list of instances available, refer to Prețuri Amazon SageMaker.

Operationalize your Amazon SageMaker Studio notebooks as scheduled notebook jobs PlatoBlockchain Data Intelligence. Vertical Search. Ai.

When a job is complete, you can view the output notebook file with its populated cells, as well as the underlying logs from the job runs.

Operationalize your Amazon SageMaker Studio notebooks as scheduled notebook jobs PlatoBlockchain Data Intelligence. Vertical Search. Ai.

Parameterize cells

When moving a notebook to a production workflow, it’s important to be able to reuse the same notebook with different sets of parameters for modularity. For example, you may want to parameterize the dataset location or the hyperparameters of your model so that you can reuse the same notebook for many distinct model trainings. SageMaker notebook jobs support this through cell tags. Simply choose the double gear icon in the right pane and choose Adaugă etichetă. Then label the tag as parameters.

Operationalize your Amazon SageMaker Studio notebooks as scheduled notebook jobs PlatoBlockchain Data Intelligence. Vertical Search. Ai.

By default, the notebook job run uses the parameter values specified in the notebook, but alternatively, you can modify these as a configuration for your notebook job.

Operationalize your Amazon SageMaker Studio notebooks as scheduled notebook jobs PlatoBlockchain Data Intelligence. Vertical Search. Ai.

Configure additional options

When creating a notebook job, you can expand the Opțiuni suplimentare section in order to customize your job definition. Studio will automatically detect the image or kernel you’re using in your notebook and pre-select it for you. Ensure that you have validated this selection.

You can also specify environment variables or startup scripts to customize your notebook run environment. For the full list of configurations, see Opțiuni suplimentare.

Schedule your job

To schedule your job, choose Rulați după un program and set an appropriate interval and time. Then you can choose the Lucrări de notebook tab that is visible after choosing the home icon. After the notebook is loaded, choose the Definiții job pentru notebook tab to pause or remove your schedule.

Operationalize your Amazon SageMaker Studio notebooks as scheduled notebook jobs PlatoBlockchain Data Intelligence. Vertical Search. Ai.

Exemplu de caz de utilizare

For our example, we showcase an end-to-end ML workflow that prepares data from a ground truth source, trains a refreshed model from that time period, and then runs inference on the most recent data to generate actionable insights. In practice, you might run a complete end-to-end workflow, or simply operationalize one step of your workflow. You can schedule an AWS Adeziv sesiune interactivă for daily data preparation, or run a batch inference job that generates graphical results directly in your output notebook.

The full notebook for this example can be found in our SageMaker Examples GitHub repository. The use case assumes that we’re a telecommunications company that is looking to schedule a notebook that predicts probable customer churn based on a model trained with the most recent data we have available.

To start, we gather the most recently available customer data and perform some preprocessing on it:

import pandas as pd
from synthetic_data import generate_data

previous_two_weeks_data = generate_data(5000, label_known=True)
todays_data = generate_data(300, label_known=False)

processed_prior_data = process_data(previous_two_weeks_data, label_known=True)
processed_todays_data = process_data(todays_data, label_known=False)

We train our refreshed model on this updated training data in order to make accurate predictions on todays_data:

import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import f1_score, confusion_matrix, ConfusionMatrixDisplay

y = np.ravel(processed_prior_data[["Churn"]])
x = processed_prior_data.drop(["Churn"], axis=1)

x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.25)

clf = RandomForestClassifier(n_estimators=int(number_rf_estimators), criterion="gini")
clf.fit(x_train, y_train)

Because we’re going to schedule this notebook as a daily report, we want to capture how good our refreshed model performed on our validation set so that we can be confident in its future predictions. The results in the following screenshot are from our scheduled inference report.

Operationalize your Amazon SageMaker Studio notebooks as scheduled notebook jobs PlatoBlockchain Data Intelligence. Vertical Search. Ai.

Lastly, you want to capture the predicted results of today’s data into a database so that actions can be taken based on the results of this model.

Operationalize your Amazon SageMaker Studio notebooks as scheduled notebook jobs PlatoBlockchain Data Intelligence. Vertical Search. Ai.

After the notebook is understood, feel free to run this as an ephemeral job using the Fugi acum option described earlier or test out the scheduling functionality.

A curăța

If you followed along with our example, be sure to pause or delete your notebook job’s schedule to avoid incurring ongoing charges.

Concluzie

Bringing notebooks to production with SageMaker notebook jobs vastly simplifies the undifferentiated heavy lifting required by data workers. Whether you’re scheduling end-to-end ML workflows or a piece of the puzzle, we encourage you to put some notebooks in production using SageMaker Studio or SageMaker Studio Lab! To learn more, see Notebook-based Workflows.


Despre autori

Sean MorganSean Morgan este arhitect senior de soluții ML la AWS. Are experiență în domeniul semiconductorilor și al cercetării academice și își folosește experiența pentru a ajuta clienții să-și atingă obiectivele pe AWS. În timpul său liber, Sean este un colaborator/menținător cu sursă deschisă activ și este liderul grupului de interese speciale pentru TensorFlow Addons.

Operationalize your Amazon SageMaker Studio notebooks as scheduled notebook jobs PlatoBlockchain Data Intelligence. Vertical Search. Ai.Sumedha Swamy este manager de produs principal la Amazon Web Services. El conduce echipa SageMaker Studio pentru a-l construi în IDE-ul ales pentru fluxurile de lucru interactive de știință a datelor și de inginerie a datelor. El și-a petrecut ultimii 15 ani construind produse pentru consumatori și întreprinderi obsedați de clienți folosind Machine Learning. În timpul liber îi place să fotografieze geologia uimitoare a sud-vestului american.

Operationalize your Amazon SageMaker Studio notebooks as scheduled notebook jobs PlatoBlockchain Data Intelligence. Vertical Search. Ai.Edward Sun este un SDE senior care lucrează pentru SageMaker Studio la Amazon Web Services. El se concentrează pe construirea de soluții interactive ML și simplificarea experienței clienților pentru a integra SageMaker Studio cu tehnologii populare în ingineria datelor și ecosistemul ML. În timpul său liber, Edward este un mare fan al campingului, al drumețiilor și al pescuitului și se bucură de timpul petrecut cu familia sa.

Timestamp-ul:

Mai mult de la Învățare automată AWS