Spodbujajte odkrivanje in ponovno uporabo funkcij v vaši organizaciji z uporabo Amazon SageMaker Feature Store in njegove zmožnosti metapodatkov na ravni funkcij PlatoBlockchain Data Intelligence. Navpično iskanje. Ai.

Spodbujajte odkrivanje in ponovno uporabo funkcij v vaši organizaciji s storitvijo Amazon SageMaker Feature Store in njeno zmogljivostjo metapodatkov na ravni funkcij

Trgovina s funkcijami Amazon SageMaker helps data scientists and machine learning (ML) engineers securely store, discover, and share curated data used in training and prediction workflows. Feature Store is a centralized store for features and associated metadata, allowing features to be easily discovered and reused by data scientist teams working on different projects or ML models.

With Feature Store, you have always been able to add metadata at the feature group level. Data scientists who want the ability to search and discover existing features for their models now have the ability to search for information at the feature level by adding custom metadata. For example, the information can include a description of the feature, the date it was last modified, its original data source, certain metrics, or the level of sensitivity.

The following diagram illustrates the architecture relationships between feature groups, features, and associated metadata. Note how data scientists can now specify descriptions and metadata at both the feature group level and the individual feature level.

In this post, we explain how data scientists and ML engineers can use feature-level metadata with the new search and discovery capabilities of Feature Store to promote better feature reuse across their organization. This capability can significantly help data scientists in the feature selection process and, as a result, help you identify features that lead to increased model accuracy.

Uporaba primera

For the purposes of this post, we use two feature groups, customer in loan.

O customer feature group has the following features:

  • starost – Customer’s age (numeric)
  • delo – Type of job (one-hot encoded, such as admin or services)
  • zakonski – Marital status (one-hot encoded, such as married or single)
  • Izobraževanje – Level of education (one-hot encoded, such as basic 4y or high school)

O loan feature group has the following features:

  • privzeto – Has credit in default? (one-hot encoded: no or yes)
  • Ohišje – Has housing loan? (one-hot encoded: no or yes)
  • posojila – Has personal loan? (one-hot encoded: no or yes)
  • total_amount – Total amount of loans (numeric)

The following figure shows example feature groups and feature metadata.

Spodbujajte odkrivanje in ponovno uporabo funkcij v vaši organizaciji z uporabo Amazon SageMaker Feature Store in njegove zmožnosti metapodatkov na ravni funkcij PlatoBlockchain Data Intelligence. Navpično iskanje. Ai.

The purpose of adding a description and assigning metadata to each feature is to increase the speed of discovery by enabling new search parameters along which a data scientist or ML engineer can explore features. These can reflect details about a feature such as its calculation, whether it’s an average over 6 months or 1 year, origin, creator or owner, what the feature means, and more.

In the following sections, we provide two approaches to search and discover features and configure feature-level metadata: the first using Amazon SageMaker Studio directly, and the second programmatically.

Feature discovery in Studio

You can easily search and query features using Studio. With the new enhanced search and discovery capabilities, you can immediately retrieve results using a simple type-ahead of a few characters.

The following screenshot demonstrates the following capabilities:

  • Do nje lahko dostopate Feature Catalog tab and observe features across feature groups. The features are presented in a table that includes the feature name, type, description, parameters, date of creation, and associated feature group’s name.
  • You can directly use the type-ahead functionality to immediately return search results.
  • You have the flexibility to use different types of filter options: All, Feature name, Descriptionali Parameters. Upoštevajte, da All will return all features where either Feature name, Descriptionali Parameters match the search criteria.
  • You can narrow down the search further by specifying a date range using the Created from in Created to fields and specifying parameters using the Search parameter key in Search parameter value Polja.

Spodbujajte odkrivanje in ponovno uporabo funkcij v vaši organizaciji z uporabo Amazon SageMaker Feature Store in njegove zmožnosti metapodatkov na ravni funkcij PlatoBlockchain Data Intelligence. Navpično iskanje. Ai.

After you have selected a feature, you can choose the feature’s name to bring up its details. When you choose Edit Metadata, you can add a description and up to 25 key-value parameters, as shown in the following screenshot. Within this view, you can ultimately create, view, update, and delete the feature’s metadata. The following screenshot illustrates how to edit feature metadata for total_amount.

Spodbujajte odkrivanje in ponovno uporabo funkcij v vaši organizaciji z uporabo Amazon SageMaker Feature Store in njegove zmožnosti metapodatkov na ravni funkcij PlatoBlockchain Data Intelligence. Navpično iskanje. Ai.

As previously stated, adding key-value pairs to a feature gives you more dimensions along which to search for their given features. For our example, the feature’s origin has been added to every feature’s metadata. When you choose the search icon and filter along the key-value pair origin: job, you can see all the features that were one-hot-encoded from this base attribute.

Feature discovery using code

You can also access and update feature information through the Vmesnik ukazne vrstice AWS (AWS CLI) and SDK (Boto3) rather than directly through the Konzola za upravljanje AWS. This allows you to integrate the feature-level search functionality of Feature Store with your own custom data science platforms. In this section, we interact with the Boto3 API endpoints to update and search feature metadata.

To begin improving feature search and discovery, you can add metadata using the update_feature_metadata API. In addition to the description in created_date fields, you can add up to 25 parameters (key-value pairs) to a given feature.

The following code is an example of five possible key-value parameters that have been added to the job_admin feature. This feature was created, along with job_services in job_none, by one-hot-encoding job.

sagemaker_client.update_feature_metadata(
    FeatureGroupName="customer",
    FeatureName="job_admin",
    ParameterAdditions=[
        {"Key": "author", "Value": "arnaud"}, # Feature's author
        {"Key": "team", "Value": "mlops"}, # Team owning the feature
        {"Key": "origin", "Value": "job"}, # Raw input parameter
        {"Key": "sensitivity", "Value": "5"}, # 1-5 scale for data sensitivity
        {"Key": "env", "Value": "testing"} # Environment the feature is used in
    ]
)

po author, team, origin, sensitivityin env so bili dodani v job_admin feature, data scientists or ML engineers can retrieve them by calling the describe_feature_metadata API. You can navigate to the Parameters object in the response for the metadata we previously added to our feature. The describe_feature_metadata API endpoint allows you to get greater insight into a given feature by getting its associated metadata.

response = sagemaker_client.describe_feature_metadata(
    FeatureGroupName="customer",
    FeatureName="job_admin",
)

# Navigate to 'Parameters' in response to get metadata
metadata = response['Parameters']

You can search for features by using the SageMaker search API using metadata as search parameters. The following code is an example function that takes a search_string parameter as an input and returns all features where the feature’s name, description, or parameters match the condition:

def search_features_using_string(search_string):
    response = sagemaker_client.search(
        Resource= "FeatureMetadata",
        SearchExpression={
            'Filters': [
               {
                   'Name': 'FeatureName',
                   'Operator': 'Contains',
                   'Value': search_string
               },
               {
                   'Name': 'Description',
                   'Operator': 'Contains',
                   'Value': search_string
               },
               {
                   'Name': 'AllParameters',
                   'Operator': 'Contains',
                   'Value': search_string
               }
           ],
           "Operator": "Or"
        },
    )

    # Displaying results in a pandas DataFrame
    df=pd.json_normalize(response['Results'], max_level=1)
    df.columns = df.columns.map(lambda col: col.split(".")[1])
    df=df.drop('FeatureGroupArn', axis=1)

    return df

The following code snippet uses our search_features function to retrieve all features for which either the feature name, description, or parameters contain the word job:

search_results = search_features_using_string('mlops')
search_results

The following screenshot contains the list of matching feature names as well as their corresponding metadata, including timestamps for each feature’s creation and last modification. You can use this information to improve discovery and visibility into your organization’s features.

Spodbujajte odkrivanje in ponovno uporabo funkcij v vaši organizaciji z uporabo Amazon SageMaker Feature Store in njegove zmožnosti metapodatkov na ravni funkcij PlatoBlockchain Data Intelligence. Navpično iskanje. Ai.

zaključek

SageMaker Feature Store provides a purpose-built feature management solution to help organizations scale ML development across business units and data science teams. Improving feature reuse and feature consistency are primary benefits of a feature store. In this post, we explained how you can use feature-level metadata to improve search and discovery of features. This included creating metadata around a variety of use cases and using it as additional search parameters.

Give it a try, and let us know what you think in comments. If you want to learn more about collaborating and sharing features within Feature Store, refer to Omogočite ponovno uporabo funkcij v računih in skupinah, ki uporabljajo Amazon SageMaker Feature Store.


O avtorjih

Spodbujajte odkrivanje in ponovno uporabo funkcij v vaši organizaciji z uporabo Amazon SageMaker Feature Store in njegove zmožnosti metapodatkov na ravni funkcij PlatoBlockchain Data Intelligence. Navpično iskanje. Ai. Arnaud Lauer je višji arhitekt partnerskih rešitev v skupini za javni sektor pri AWS. Partnerjem in strankam omogoča, da razumejo, kako najbolje uporabiti tehnologije AWS za pretvorbo poslovnih potreb v rešitve. Prinaša več kot 16 let izkušenj pri izvajanju in oblikovanju projektov digitalne transformacije v različnih panogah, vključno z javnim sektorjem, energetiko in potrošniškim blagom. Umetna inteligenca in strojno učenje sta nekaj njegovih strasti. Arnaud ima 12 certifikatov AWS, vključno s certifikatom ML Specialty.

Spodbujajte odkrivanje in ponovno uporabo funkcij v vaši organizaciji z uporabo Amazon SageMaker Feature Store in njegove zmožnosti metapodatkov na ravni funkcij PlatoBlockchain Data Intelligence. Navpično iskanje. Ai.Nicolas Bernier is an Associate Solutions Architect, part of the Canadian Public Sector team at AWS. He is currently conducting a master’s degree with a research area in Deep Learning and holds five AWS certifications, including the ML Specialty Certification. Nicolas is passionate about helping customers deepen their knowledge of AWS by working with them to translate their business challenges into technical solutions.

Spodbujajte odkrivanje in ponovno uporabo funkcij v vaši organizaciji z uporabo Amazon SageMaker Feature Store in njegove zmožnosti metapodatkov na ravni funkcij PlatoBlockchain Data Intelligence. Navpično iskanje. Ai.Mark Roy je glavni arhitekt strojnega učenja za AWS, ki strankam pomaga pri oblikovanju in izdelavi rešitev AI / ML. Markovo delo zajema široko paleto primerov uporabe ML, predvsem pa računalniški vid, poglobljeno učenje in razširjanje ML v celotnem podjetju. Pomagal je podjetjem v številnih panogah, vključno z zavarovalništvom, finančnimi storitvami, mediji in zabavo, zdravstvom, komunalnimi storitvami in proizvodnjo. Mark ima šest certifikatov AWS, vključno s certifikatom ML Specialty. Preden se je Mark pridružil AWS, je bil več kot 25 let arhitekt, razvijalec in vodja tehnologije, vključno z 19 leti v finančnih storitvah.

Spodbujajte odkrivanje in ponovno uporabo funkcij v vaši organizaciji z uporabo Amazon SageMaker Feature Store in njegove zmožnosti metapodatkov na ravni funkcij PlatoBlockchain Data Intelligence. Navpično iskanje. Ai.Khushboo Srivastava is a Senior Product Manager for Amazon SageMaker. She enjoys building products that simplify machine learning workflows for customers. In her spare time, she enjoys playing violin, practicing yoga, and traveling.

Časovni žig:

Več od Strojno učenje AWS