You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/machine-learning/algorithm-module-reference/evaluate-model.md
+9-30Lines changed: 9 additions & 30 deletions
Original file line number
Diff line number
Diff line change
@@ -9,7 +9,7 @@ ms.topic: reference
9
9
10
10
author: likebupt
11
11
ms.author: keli19
12
-
ms.date: 02/24/2020
12
+
ms.date: 04/24/2020
13
13
---
14
14
# Evaluate Model module
15
15
@@ -28,36 +28,15 @@ Use this module to measure the accuracy of a trained model. You provide a datase
28
28
> If you are new to model evaluation, we recommend the video series by Dr. Stephen Elston, as part of the [machine learning course](https://blogs.technet.microsoft.com/machinelearning/2015/09/08/new-edx-course-data-science-machine-learning-essentials/) from EdX.
29
29
30
30
31
-
There are three ways to use the **Evaluate Model** module:
31
+
## How to use Evaluate Model
32
+
1. Connect the **Scored dataset** output of the [Score Model](./score-model.md) to the left input port of **Evaluate Model**.
32
33
33
-
+ Generate scores over your training data, and evaluate the model based on these scores
34
-
+ Generate scores on the model, but compare those scores to scores on a reserved testing set
35
-
+ Compare scores for two different but related models, using the same set of data
34
+
2.[Optional] Connect the **Scored dataset** output of the [Score Model](./score-model.md) for the second model to the **right-hand** input of **Evaluate Model**. You can easily compare results from two different models on the same data. The two input algorithms should be the same algorithm type. Or, you might compare scores from two different runs over the same data with different parameters.
36
35
37
-
## Use the training data
36
+
> [!NOTE]
37
+
> Algorithm type refers to 'Two-class Classification', 'Multi-class Classification', 'Regression', 'Clustering' under 'Machine Learning Algorithms'.
38
38
39
-
To evaluate a model, you must connect a dataset that contains a set of input columns and scores. If no other data is available, you can use your original dataset.
40
-
41
-
1. Connect the **Scored dataset** output of the [Score Model](./score-model.md) to the input of **Evaluate Model**.
42
-
2. Click **Evaluate Model** module, and run the pipeline to generate the evaluation scores.
43
-
44
-
## Use testing data
45
-
46
-
A common scenario in machine learning is to separate your original data set into training and testing datasets, using the [Split](./split-data.md) module, or the [Partition and Sample](./partition-and-sample.md) module.
47
-
48
-
1. Connect the **Scored dataset** output of the [Score Model](score-model.md) to the input of **Evaluate Model**.
49
-
2. Connect the output of the Split Data module that contains the testing data to the right-hand input of **Evaluate Model**.
50
-
2. Click **Evaluate Model** module, and select **Run selected** to generate the evaluation scores.
51
-
52
-
## Compare scores from two models
53
-
54
-
You can also connect a second set of scores to **Evaluate Model**. The scores might be a shared evaluation set that has known results, or a set of results from a different model for the same data.
55
-
56
-
This feature is useful because you can easily compare results from two different models on the same data. Or, you might compare scores from two different runs over the same data with different parameters.
57
-
58
-
1. Connect the **Scored dataset** output of the [Score Model](score-model.md) to the input of **Evaluate Model**.
59
-
2. Connect the output of the Score Model module for the second model to the right-hand input of **Evaluate Model**.
60
-
3. Submit the pipeline.
39
+
3. Submit the pipeline to generate the evaluation scores.
61
40
62
41
## Results
63
42
@@ -134,9 +113,9 @@ The following metrics are reported for evaluating clustering models.
134
113
135
114
If the number of data points assigned to clusters is less than the total number of data points available, it means that the data points could not be assigned to a cluster.
136
115
137
-
- The scores in the column, **Maximal Distance to Cluster Center**, represent the sum of the distances between each point and the centroid of that point’s cluster.
116
+
- The scores in the column, **Maximal Distance to Cluster Center**, represent the sum of the distances between each point and the centroid of that point's cluster.
138
117
139
-
If this number is high, it can mean that the cluster is widely dispersed. You should review this statistic together with the **Average Distance to Cluster Center** to determine the cluster’s spread.
118
+
If this number is high, it can mean that the cluster is widely dispersed. You should review this statistic together with the **Average Distance to Cluster Center** to determine the cluster's spread.
140
119
141
120
- The **Combined Evaluation** score at the bottom of the each section of results lists the averaged scores for the clusters created in that particular model.
0 commit comments