Machine learning modules in ML Studio (classic) modules

Article
05/06/2019

Important

Support for Machine Learning Studio (classic) will end on 31 August 2024. We recommend you transition to Azure Machine Learning by that date.

Beginning 1 December 2021, you will not be able to create new Machine Learning Studio (classic) resources. Through 31 August 2024, you can continue to use the existing Machine Learning Studio (classic) resources.

See information on moving machine learning projects from ML Studio (classic) to Azure Machine Learning.
Learn more about Azure Machine Learning.

ML Studio (classic) documentation is being retired and may not be updated in the future.

The typical workflow for machine learning includes many phases:

Identifying a problem to solve and a metric for measuring results.
Finding, cleaning, and preparing appropriate data.
Identifying the best features and engineering new features.
Building, evaluating, and tuning models.
Using models to generate predictions, recommendations, and other results.

The modules in this section provide tools for the final phases of machine learning, in which you apply an algorithm to data to train a model. In these final phases, you also generate scores, and then evaluate the accuracy and usefulness of the model.

Note

Applies to: Machine Learning Studio (classic) only

Similar drag-and-drop modules are available in Azure Machine Learning designer.

List of machine learning tasks by category

Initialize Model

Choose from a variety of customizable machine learning algorithms, including clustering, regression, classification, and anomaly detection models.
Train

Provide your data to the configured model to learn from patterns and create statistics that can be used for predictions.
Score

Create predictions using the trained models.
Evaluate

Measure the accuracy of a trained model, or compare multiple models.

For a detailed description of this experimental workflow, see the credit risk solution walkthrough.

Prerequisites

Before you can get to the fun part of building a model, typically a lot of preparation is required. This section provides links to tools in Machine Learning Studio (classic) that can help you clean up your data, improve the quality of input, and prevent run-time errors.

Data exploration and data quality

Ensure that your data is the right kind of data, the right quantity, and the right quality for the algorithm you’ve chosen. Understand how much data you have, and how it is distributed. Are there outliers? How were those generated, and what do they mean? Are there any duplicate records?

Handle missing values

Missing values can affect your results in many ways. For example, almost all statistical methods discard cases with missing values. By default, Machine Learning follows these rules when it encounters rows with missing values:

If data used to train a model has missing values, any rows with missing values are skipped.
If data used as input when scoring against a model has missing values, the missing values are used as inputs, but nulls are propagated. This usually means that a null is inserted in the results instead of a valid prediction.

Be sure to check your data before training your model. To impute the missing values or correct your data, use this module:

Clean Missing Data

Select features and reduce dimensionality

Machine Learning Studio (classic) can help you sift through your data to find the most useful attributes.

Use tools such as Fisher Linear Discriminant Analysis or Filter Based Feature Selection to determine which columns of data have the most predictive power. These tools can also identify columns that should be removed because of data leakage.
Create or engineer features from existing data. Normalize data or group data into bins to make new groupings of data, or standardize the range of numeric values prior to analysis.
Reduce dimensionality by grouping categorical values, by using principal component analysis, or by sampling.

Examples

For examples of machine learning in action, see the Azure AI Gallery.

For tips, and a walkthrough of some typical data prepration tasks, see Walkthroughs executing the Team Data Science Process.