Trgovina s funkcijami Amazon SageMaker helps data scientists and machine learning (ML) engineers securely store, discover, and share curated data used in training and prediction workflows. Feature Store is a centralized store for features and associated metadata, allowing features to be easily discovered and reused by data scientist teams working on different projects or ML models.
With Feature Store, you have always been able to add metadata at the feature group level. Data scientists who want the ability to search and discover existing features for their models now have the ability to search for information at the feature level by adding custom metadata. For example, the information can include a description of the feature, the date it was last modified, its original data source, certain metrics, or the level of sensitivity.
The following diagram illustrates the architecture relationships between feature groups, features, and associated metadata. Note how data scientists can now specify descriptions and metadata at both the feature group level and the individual feature level.
In this post, we explain how data scientists and ML engineers can use feature-level metadata with the new search and discovery capabilities of Feature Store to promote better feature reuse across their organization. This capability can significantly help data scientists in the feature selection process and, as a result, help you identify features that lead to increased model accuracy.
Uporaba primera
For the purposes of this post, we use two feature groups, customer
in loan
.
O customer
feature group has the following features:
- starost – Customer’s age (numeric)
- delo – Type of job (one-hot encoded, such as
admin
orservices
) - zakonski – Marital status (one-hot encoded, such as
married
orsingle
) - Izobraževanje – Level of education (one-hot encoded, such as
basic 4y
orhigh school
)
O loan
feature group has the following features:
- privzeto – Has credit in default? (one-hot encoded:
no
oryes
) - Ohišje – Has housing loan? (one-hot encoded:
no
oryes
) - posojila – Has personal loan? (one-hot encoded:
no
oryes
) - total_amount – Total amount of loans (numeric)
The following figure shows example feature groups and feature metadata.
The purpose of adding a description and assigning metadata to each feature is to increase the speed of discovery by enabling new search parameters along which a data scientist or ML engineer can explore features. These can reflect details about a feature such as its calculation, whether it’s an average over 6 months or 1 year, origin, creator or owner, what the feature means, and more.
In the following sections, we provide two approaches to search and discover features and configure feature-level metadata: the first using Amazon SageMaker Studio directly, and the second programmatically.
Feature discovery in Studio
You can easily search and query features using Studio. With the new enhanced search and discovery capabilities, you can immediately retrieve results using a simple type-ahead of a few characters.
The following screenshot demonstrates the following capabilities:
- Do nje lahko dostopate Feature Catalog tab and observe features across feature groups. The features are presented in a table that includes the feature name, type, description, parameters, date of creation, and associated feature group’s name.
- You can directly use the type-ahead functionality to immediately return search results.
- You have the flexibility to use different types of filter options:
All
,Feature name
,Description
aliParameters
. Upoštevajte, daAll
will return all features where eitherFeature name
,Description
aliParameters
match the search criteria. - You can narrow down the search further by specifying a date range using the
Created from
inCreated to
fields and specifying parameters using theSearch parameter key
inSearch parameter value
Polja.
After you have selected a feature, you can choose the feature’s name to bring up its details. When you choose Edit Metadata, you can add a description and up to 25 key-value parameters, as shown in the following screenshot. Within this view, you can ultimately create, view, update, and delete the feature’s metadata. The following screenshot illustrates how to edit feature metadata for total_amount
.
As previously stated, adding key-value pairs to a feature gives you more dimensions along which to search for their given features. For our example, the feature’s origin has been added to every feature’s metadata. When you choose the search icon and filter along the key-value pair origin: job
, you can see all the features that were one-hot-encoded from this base attribute.
Feature discovery using code
You can also access and update feature information through the Vmesnik ukazne vrstice AWS (AWS CLI) and SDK (Boto3) rather than directly through the Konzola za upravljanje AWS. This allows you to integrate the feature-level search functionality of Feature Store with your own custom data science platforms. In this section, we interact with the Boto3 API endpoints to update and search feature metadata.
To begin improving feature search and discovery, you can add metadata using the update_feature_metadata
API. In addition to the description
in created_date
fields, you can add up to 25 parameters (key-value pairs) to a given feature.
The following code is an example of five possible key-value parameters that have been added to the job_admin
feature. This feature was created, along with job_services
in job_none
, by one-hot-encoding job
.
po author
, team
, origin
, sensitivity
in env
so bili dodani v job_admin
feature, data scientists or ML engineers can retrieve them by calling the describe_feature_metadata
API. You can navigate to the Parameters
object in the response for the metadata we previously added to our feature. The describe_feature_metadata
API endpoint allows you to get greater insight into a given feature by getting its associated metadata.
You can search for features by using the SageMaker search
API using metadata as search parameters. The following code is an example function that takes a search_string
parameter as an input and returns all features where the feature’s name, description, or parameters match the condition:
The following code snippet uses our search_features
function to retrieve all features for which either the feature name, description, or parameters contain the word job
:
The following screenshot contains the list of matching feature names as well as their corresponding metadata, including timestamps for each feature’s creation and last modification. You can use this information to improve discovery and visibility into your organization’s features.
zaključek
SageMaker Feature Store provides a purpose-built feature management solution to help organizations scale ML development across business units and data science teams. Improving feature reuse and feature consistency are primary benefits of a feature store. In this post, we explained how you can use feature-level metadata to improve search and discovery of features. This included creating metadata around a variety of use cases and using it as additional search parameters.
Give it a try, and let us know what you think in comments. If you want to learn more about collaborating and sharing features within Feature Store, refer to Omogočite ponovno uporabo funkcij v računih in skupinah, ki uporabljajo Amazon SageMaker Feature Store.
O avtorjih
Arnaud Lauer je višji arhitekt partnerskih rešitev v skupini za javni sektor pri AWS. Partnerjem in strankam omogoča, da razumejo, kako najbolje uporabiti tehnologije AWS za pretvorbo poslovnih potreb v rešitve. Prinaša več kot 16 let izkušenj pri izvajanju in oblikovanju projektov digitalne transformacije v različnih panogah, vključno z javnim sektorjem, energetiko in potrošniškim blagom. Umetna inteligenca in strojno učenje sta nekaj njegovih strasti. Arnaud ima 12 certifikatov AWS, vključno s certifikatom ML Specialty.
Nicolas Bernier is an Associate Solutions Architect, part of the Canadian Public Sector team at AWS. He is currently conducting a master’s degree with a research area in Deep Learning and holds five AWS certifications, including the ML Specialty Certification. Nicolas is passionate about helping customers deepen their knowledge of AWS by working with them to translate their business challenges into technical solutions.
Mark Roy je glavni arhitekt strojnega učenja za AWS, ki strankam pomaga pri oblikovanju in izdelavi rešitev AI / ML. Markovo delo zajema široko paleto primerov uporabe ML, predvsem pa računalniški vid, poglobljeno učenje in razširjanje ML v celotnem podjetju. Pomagal je podjetjem v številnih panogah, vključno z zavarovalništvom, finančnimi storitvami, mediji in zabavo, zdravstvom, komunalnimi storitvami in proizvodnjo. Mark ima šest certifikatov AWS, vključno s certifikatom ML Specialty. Preden se je Mark pridružil AWS, je bil več kot 25 let arhitekt, razvijalec in vodja tehnologije, vključno z 19 leti v finančnih storitvah.
Khushboo Srivastava is a Senior Product Manager for Amazon SageMaker. She enjoys building products that simplify machine learning workflows for customers. In her spare time, she enjoys playing violin, practicing yoga, and traveling.
- AI
- ai art
- ai art generator
- imajo robota
- Amazon SageMaker
- Umetna inteligenca
- certificiranje umetne inteligence
- umetna inteligenca v bančništvu
- robot z umetno inteligenco
- roboti z umetno inteligenco
- programska oprema za umetno inteligenco
- Strojno učenje AWS
- blockchain
- blockchain konferenca ai
- coingenius
- pogovorna umetna inteligenca
- kripto konferenca ai
- dall's
- globoko učenje
- strojno učenje
- platon
- platon ai
- Platonova podatkovna inteligenca
- Igra Platon
- PlatoData
- platogaming
- lestvica ai
- sintaksa
- zefirnet