Működtesse Amazon SageMaker Studio notebookjait ütemezett notebook munkákként PlatoBlockchain Data Intelligence. Függőleges keresés. Ai.

Működtesse Amazon SageMaker Studio notebookjait ütemezett notebook munkákként

Amazon SageMaker Studio provides a fully managed solution for data scientists to interactively build, train, and deploy machine learning (ML) models. In addition to the interactive ML experience, data workers also seek solutions to run notebooks as ephemeral jobs without the need to refactor code as Python modules or learn DevOps tools and best practices to automate their deployment infrastructure. Some common use cases for doing this include:

  • Rendszeresen futó modellkövetkeztetés jelentések generálásához
  • Scaling up a feature engineering step after having tested in Studio against a subset of data on a small instance
  • A modellek átképzése és bevezetése bizonyos ütemben
  • Analyzing your team’s Amazon SageMaker usage on a regular cadence

Previously, when data scientists wanted to take the code they built interactively on notebooks and run them as batch jobs, they were faced with a steep learning curve using Amazon SageMaker csővezetékek, AWS Lambda, Amazon EventBridgevagy egyéb megoldások that are difficult to set up, use, and manage.

A SageMaker notebook munkák, you can now run your notebooks as is or in a parameterized fashion with just a few simple clicks from the SageMaker Studio or SageMaker Studio Lab interface. You can run these notebooks on a schedule or immediately. There’s no need for the end-user to modify their existing notebook code. When the job is complete, you can view the populated notebook cells, including any visualizations!

In this post, we share how to operationalize your SageMaker Studio notebooks as scheduled notebook jobs.

Megoldás áttekintése

The following diagram illustrates our solution architecture. We utilize the pre-installed SageMaker extension to run notebooks as a job immediately or on a schedule.

In the following sections, we walk through the steps to create a notebook, parameterize cells, customize additional options, and schedule your job. We also include a sample use case.

Előfeltételek

To use SageMaker notebook jobs, you need to be running a JupyterLab 3 JupyterServer app within Studio. For more information on how to upgrade to JupyterLab 3, refer to View and update the JupyterLab version of an app from the console. Ügyeljen arra, hogy Shut down and Update SageMaker Studio in order to pick up the latest updates.

To define job definitions that run notebooks on a schedule, you may need to add additional permissions to your SageMaker execution role.

First, add a trust relationship to your SageMaker execution role that allows events.amazonaws.com to assume your role:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "Service": "sagemaker.amazonaws.com"
            },
            "Action": "sts:AssumeRole"
        },
        {
            "Effect": "Allow",
            "Principal": {
                "Service": "events.amazonaws.com"
            },
            "Action": "sts:AssumeRole"
        }
    ]
}

Additionally, you may need to create and attach an inline policy to your execution role. The below policy is supplementary to the very permissive AmazonSageMakerFullAccess policy. For a complete and minimal set of permissions see Install Policies and Permissions.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": [
                "events:TagResource",
                "events:DeleteRule",
                "events:PutTargets",
                "events:DescribeRule",
                "events:PutRule",
                "events:RemoveTargets",
                "events:DisableRule",
                "events:EnableRule"
            ],
            "Resource": "*",
            "Condition": {
              "StringEquals": {
                "aws:ResourceTag/sagemaker:is-scheduling-notebook-job": "true"
              }
            }
        },
        {
            "Sid": "VisualEditor1",
            "Effect": "Allow",
            "Action": "iam:PassRole",
            "Resource": "arn:aws:iam::*:role/*",
            "Condition": {
                "StringLike": {
                    "iam:PassedToService": "events.amazonaws.com"
                }
            }
        },
        {
            "Sid": "VisualEditor2",
            "Effect": "Allow",
            "Action": "sagemaker:ListTags",
            "Resource": "arn:aws:sagemaker:*:*:user-profile/*/*"
        }
    ]
}

Hozzon létre egy notebook feladatot

To operationalize your notebook as a SageMaker notebook job, choose the Hozzon létre egy notebook feladatot ikonra.

Működtesse Amazon SageMaker Studio notebookjait ütemezett notebook munkákként PlatoBlockchain Data Intelligence. Függőleges keresés. Ai.

Alternatively, you can choose (right-click) your notebook on the file system and choose Jegyzetfüzet-feladat létrehozása.

Működtesse Amazon SageMaker Studio notebookjait ütemezett notebook munkákként PlatoBlockchain Data Intelligence. Függőleges keresés. Ai.

A Állás létrehozása section, simply choose the right instance type for your scheduled job based on your workload: standard instances, compute optimized instances, or accelerated computing instances that contain GPUs. You can choose any of the instances available for SageMaker training jobs. For the complete list of instances available, refer to Amazon SageMaker árképzés.

Működtesse Amazon SageMaker Studio notebookjait ütemezett notebook munkákként PlatoBlockchain Data Intelligence. Függőleges keresés. Ai.

When a job is complete, you can view the output notebook file with its populated cells, as well as the underlying logs from the job runs.

Működtesse Amazon SageMaker Studio notebookjait ütemezett notebook munkákként PlatoBlockchain Data Intelligence. Függőleges keresés. Ai.

Parameterize cells

When moving a notebook to a production workflow, it’s important to be able to reuse the same notebook with different sets of parameters for modularity. For example, you may want to parameterize the dataset location or the hyperparameters of your model so that you can reuse the same notebook for many distinct model trainings. SageMaker notebook jobs support this through cell tags. Simply choose the double gear icon in the right pane and choose Add Tag. Then label the tag as parameters.

Működtesse Amazon SageMaker Studio notebookjait ütemezett notebook munkákként PlatoBlockchain Data Intelligence. Függőleges keresés. Ai.

By default, the notebook job run uses the parameter values specified in the notebook, but alternatively, you can modify these as a configuration for your notebook job.

Működtesse Amazon SageMaker Studio notebookjait ütemezett notebook munkákként PlatoBlockchain Data Intelligence. Függőleges keresés. Ai.

Configure additional options

When creating a notebook job, you can expand the További lehetőségek section in order to customize your job definition. Studio will automatically detect the image or kernel you’re using in your notebook and pre-select it for you. Ensure that you have validated this selection.

You can also specify environment variables or startup scripts to customize your notebook run environment. For the full list of configurations, see További lehetőségek.

Schedule your job

To schedule your job, choose Fuss ütemterv szerint and set an appropriate interval and time. Then you can choose the Notebook állások tab that is visible after choosing the home icon. After the notebook is loaded, choose the Notebook munkaköri meghatározások tab to pause or remove your schedule.

Működtesse Amazon SageMaker Studio notebookjait ütemezett notebook munkákként PlatoBlockchain Data Intelligence. Függőleges keresés. Ai.

Példa használati esetre

For our example, we showcase an end-to-end ML workflow that prepares data from a ground truth source, trains a refreshed model from that time period, and then runs inference on the most recent data to generate actionable insights. In practice, you might run a complete end-to-end workflow, or simply operationalize one step of your workflow. You can schedule an AWS ragasztó interaktív foglalkozás for daily data preparation, or run a batch inference job that generates graphical results directly in your output notebook.

The full notebook for this example can be found in our SageMaker Examples GitHub repository. The use case assumes that we’re a telecommunications company that is looking to schedule a notebook that predicts probable customer churn based on a model trained with the most recent data we have available.

To start, we gather the most recently available customer data and perform some preprocessing on it:

import pandas as pd
from synthetic_data import generate_data

previous_two_weeks_data = generate_data(5000, label_known=True)
todays_data = generate_data(300, label_known=False)

processed_prior_data = process_data(previous_two_weeks_data, label_known=True)
processed_todays_data = process_data(todays_data, label_known=False)

We train our refreshed model on this updated training data in order to make accurate predictions on todays_data:

import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import f1_score, confusion_matrix, ConfusionMatrixDisplay

y = np.ravel(processed_prior_data[["Churn"]])
x = processed_prior_data.drop(["Churn"], axis=1)

x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.25)

clf = RandomForestClassifier(n_estimators=int(number_rf_estimators), criterion="gini")
clf.fit(x_train, y_train)

Because we’re going to schedule this notebook as a daily report, we want to capture how good our refreshed model performed on our validation set so that we can be confident in its future predictions. The results in the following screenshot are from our scheduled inference report.

Működtesse Amazon SageMaker Studio notebookjait ütemezett notebook munkákként PlatoBlockchain Data Intelligence. Függőleges keresés. Ai.

Lastly, you want to capture the predicted results of today’s data into a database so that actions can be taken based on the results of this model.

Működtesse Amazon SageMaker Studio notebookjait ütemezett notebook munkákként PlatoBlockchain Data Intelligence. Függőleges keresés. Ai.

After the notebook is understood, feel free to run this as an ephemeral job using the Fuss most option described earlier or test out the scheduling functionality.

Tisztítsuk meg

If you followed along with our example, be sure to pause or delete your notebook job’s schedule to avoid incurring ongoing charges.

Következtetés

Bringing notebooks to production with SageMaker notebook jobs vastly simplifies the undifferentiated heavy lifting required by data workers. Whether you’re scheduling end-to-end ML workflows or a piece of the puzzle, we encourage you to put some notebooks in production using SageMaker Studio or SageMaker Studio Lab! To learn more, see Notebook-based Workflows.


A szerzőkről

Sean MorganSean Morgan az AWS vezető ML Solutions építésze. Tapasztalattal rendelkezik a félvezetők és az akadémiai kutatás területén, és tapasztalatait arra használja fel, hogy segítse ügyfeleit céljaik elérésében az AWS-ben. Szabadidejében Sean aktív nyílt forráskódú közreműködő/karbantartó, és a TensorFlow Addons speciális érdekcsoportjának vezetője.

Működtesse Amazon SageMaker Studio notebookjait ütemezett notebook munkákként PlatoBlockchain Data Intelligence. Függőleges keresés. Ai.Sumedha Swamy az Amazon Web Services fő termékmenedzsere. Ő vezeti a SageMaker Studio csapatát, hogy beépítse az interaktív adattudományi és adatmérnöki munkafolyamatok választott IDE-jébe. Az elmúlt 15 évben az ügyfelek megszállottja fogyasztói és vállalati termékek gyártásával töltötte a Machine Learning segítségével. Szabadidejében szívesen fényképezi az amerikai délnyugat csodálatos geológiáját.

Működtesse Amazon SageMaker Studio notebookjait ütemezett notebook munkákként PlatoBlockchain Data Intelligence. Függőleges keresés. Ai.Edward Sun Senior SDE, az Amazon Web Services SageMaker Studio-nál dolgozik. Az interaktív ML-megoldások kiépítésére és az ügyfélélmény egyszerűsítésére összpontosít, hogy integrálja a SageMaker Studio-t az adattervezés és az ML ökoszisztéma népszerű technológiáival. Szabadidejében Edward nagy rajongója a kempingezésnek, túrázásnak és horgászatnak, és szereti a családjával eltöltött időt.

Időbélyeg:

Még több AWS gépi tanulás