Machine learning modules in ML Studio (classic) modules

Important

Support for Machine Learning Studio (classic) will end on 31 August 2024. We recommend you transition to Azure Machine Learning by that date.

Beginning 1 December 2021, you will not be able to create new Machine Learning Studio (classic) resources. Through 31 August 2024, you can continue to use the existing Machine Learning Studio (classic) resources.

ML Studio (classic) documentation is being retired and may not be updated in the future.

The typical workflow for machine learning includes many phases:

  • Identifying a problem to solve and a metric for measuring results.

  • Finding, cleaning, and preparing appropriate data.

  • Identifying the best features and engineering new features.

  • Building, evaluating, and tuning models.

  • Using models to generate predictions, recommendations, and other results.

The modules in this section provide tools for the final phases of machine learning, in which you apply an algorithm to data to train a model. In these final phases, you also generate scores, and then evaluate the accuracy and usefulness of the model.

Note

Applies to: Machine Learning Studio (classic) only

Similar drag-and-drop modules are available in Azure Machine Learning designer.

List of machine learning tasks by category

For a detailed description of this experimental workflow, see the credit risk solution walkthrough.

Prerequisites

Before you can get to the fun part of building a model, typically a lot of preparation is required. This section provides links to tools in Machine Learning Studio (classic) that can help you clean up your data, improve the quality of input, and prevent run-time errors.

Data exploration and data quality

Ensure that your data is the right kind of data, the right quantity, and the right quality for the algorithm you’ve chosen. Understand how much data you have, and how it is distributed. Are there outliers? How were those generated, and what do they mean? Are there any duplicate records?

Handle missing values

Missing values can affect your results in many ways. For example, almost all statistical methods discard cases with missing values. By default, Machine Learning follows these rules when it encounters rows with missing values:

  • If data used to train a model has missing values, any rows with missing values are skipped.

  • If data used as input when scoring against a model has missing values, the missing values are used as inputs, but nulls are propagated. This usually means that a null is inserted in the results instead of a valid prediction.

Be sure to check your data before training your model. To impute the missing values or correct your data, use this module:

Select features and reduce dimensionality

Machine Learning Studio (classic) can help you sift through your data to find the most useful attributes.

Examples

For examples of machine learning in action, see the Azure AI Gallery.

For tips, and a walkthrough of some typical data prepration tasks, see Walkthroughs executing the Team Data Science Process.

See also