Skip to content

Commit 7ac281d

Browse files
Updates on Readme.md for DatabaseLoader (dotnet#631)
* DatabaseLoader sample - Baseline * DatabaseLoader working against localdb attached from DB path file. * Removed unneeded-old project file * Added tbd README.md file * Removed internal Nuget feeds from the nuget.config * Added a shorter name for the localdb database instead of based on the file path * Added Retries logic policy to the Connection String when using Azure SQL Database * Simplified so it doesnt use the ProviderFactory code but simple an SqlClientFactory.Instance * Updated to public Previews * Removed unneeded CsvParser reference * Comments update doe conn-strings * Added README.MD for DatabaseLoader and related
1 parent bc306e5 commit 7ac281d

File tree

3 files changed

+61
-5
lines changed

3 files changed

+61
-5
lines changed

README.md

+3-1
Original file line numberDiff line numberDiff line change
@@ -114,13 +114,15 @@ The official ML.NET samples are divided in multiple categories depending on the
114114
<tr>
115115
<td align="middle"><img src="images/smile.png" alt="Database chart"><br><img src="images/app-type-e2e-black.png" alt="End-to-end app icon"><br><b>Scalable Model on Blazor web app<br><a href="samples/csharp/end-to-end-apps/ScalableSentimentAnalysisBlazorWebApp">C#</a><b></td>
116116
<td align="middle"><img src="images/large-data-set.png" alt="large file chart"><br><img src="images/app-type-getting-started-term-cursor.png" alt="Getting started icon"><br><b>Large Datasets<br><a href="samples/csharp/getting-started/LargeDatasets">C#</a><b></td>
117-
<td align="middle"><img src="images/database.png" alt="Database chart"><br><img src="images/app-type-getting-started-term-cursor.png" alt="Getting started icon"><br><b>Training model with Database<br><a href="samples/csharp/getting-started/DatabaseIntegration">C#</a><b></td>
117+
<td align="middle"><img src="images/database.png" alt="Database chart"><br><img src="images/app-type-getting-started-term-cursor.png" alt="Getting started icon"><br><b>Loading data with DatabaseLoader<br><a href="samples/csharp/getting-started/DatabaseLoader">C#</a><b></td>
118118
</tr>
119119
<tr>
120+
<td align="middle"><img src="images/database.png" alt="Database chart"><br><img src="images/app-type-getting-started-term-cursor.png" alt="Getting started icon"><br><b>Loading data with LoadFromEnumerable<br><a href="samples/csharp/getting-started/DatabaseIntegration">C#</a><b></td>
120121
<td align="middle"><img src="images/model-explain-smaller.png" alt="Model explainability chart"><br><img src="images/app-type-e2e-black.png" alt="End-to-end app icon"><br><b>Model Explainability<br><a href="samples/csharp/end-to-end-apps/Model-Explainability">C#</a></b></td>
121122
</tr>
122123
</table>
123124

125+
124126
# Automate ML.NET models generation (Preview state)
125127

126128
The previous samples show you how to use the ML.NET API 1.0 (GA since May 2019).

samples/csharp/getting-started/DatabaseIntegration/README.md

+7-3
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,14 @@
1-
# Using a relational database as a data source for training and validating a model
2-
This sample demonstrates how to use a database as a data source for an ML.NET pipeline by using an IEnumerable. Since the database is treated as any other datasource, it is possible to query the database and use the resulting data for training and prediction scenarios.
1+
# Using LoadFromEnumerable and Entity Framework with a relational database as a data source for training and validating a model
2+
This sample demonstrates how to use a database as a data source for an ML.NET pipeline by using an IEnumerable. Since a database is treated as any other datasource, it is possible to query the database and use the resulting data for training and prediction scenarios.
3+
4+
**Update (Sept. 2nd 2019): If you want to load data from a relational database, there's a simpler approach in ML.NET by using the DatabaseLoader. Check the [DatabaseLoader sample](/samples/csharp/getting-started/DatabaseLoader)**.
5+
6+
Note that you could also implement a similar aproach using **LoadFromEnumerable** but using a **No-SQL** database or any other data source instead a relational database. However, this example is using a relational database being accessed by Entity Framework.
37

48
## Problem
59
Enterprise users have a need to use their existing data set that is in their company's database to train and predict with ML.NET.
610

7-
Even when in most cases data needs to be clean-up and prepared before training a machine learning model, many enterprises are more familiar with relational databases and SQL statements for transforming and preparing data and prefer to have centralized and secured data into database servers instead of working with exported plain text files.
11+
Even when in most cases data needs to be clean-up and prepared before training a machine learning model, many enterprises are very familiar with databases for transforming and preparing data and prefer to have centralized and secured data into database servers instead of working with exported plain text files.
812

913
## Out of scope
1014

Original file line numberDiff line numberDiff line change
@@ -1,2 +1,52 @@
11

2-
TBD
2+
# Sample using DatabaseLoader for training an ML model directly against data in a SQL Server database (Or any relational database)
3+
4+
![](https://devblogs.microsoft.com/dotnet/wp-content/uploads/sites/10/2019/08/database-loader-illustration-300x181.png)
5+
6+
| ML.NET version | API type | Status | App Type | Data type | Scenario | ML Task | Algorithms |
7+
|----------------|-------------------|-------------------------------|-------------|-----------|---------------------|---------------------------|-----------------------------|
8+
| v0.16-Preview | Dynamic API | up-to-date | Console app | SQL Server database or any relational database | IDataView from DB | Any | Any |
9+
10+
This sample shows you how you can use the native database loader ro directly train an ML model against relational databases. This loader supports any relational database provider supported by System.Data in .NET Core or .NET Framework, meaning that you can use any RDBMS such as SQL Server, Azure SQL Database, Oracle, SQLite, PostgreSQL, MySQL, Progress, IBM DB2, etc.
11+
12+
## Problem
13+
14+
In the enterprise and many organizations in general, data is organized and stored as relational databases to be used by enterprise applications. Many of those organizations also prepare their ML model training/evaluation data in relational databases which is also where the new data is being collected and prepared. Therefore, many of those users would also like to directly train/evaluate ML models directly agaist that data stored in relational databases.
15+
16+
## Background
17+
18+
In previous [ML.NET](https://dot.net/ml) releases, since [ML.NET](https://dot.net/ml) 1.0, you could also train against a relational database by providing data through an IEnumerable collection by using the [LoadFromEnumerable()](https://docs.microsoft.com/en-us/dotnet/api/microsoft.ml.dataoperationscatalog.loadfromenumerable?view=ml-dotnet) API where the data could be coming from a relational database or any other source. However, when using that approach, you as a developer are responsible for the code reading from the relational database (such as using Entity Framework or any other approach) which needs to be implemented properly so you are streaming data while training the ML model, as in this [previous sample using LoadFromEnumerable()](https://github.com/dotnet/machinelearning-samples/tree/master/samples/csharp/getting-started/DatabaseIntegration).
19+
20+
## Solution
21+
22+
This new Database Loader provides a much simpler code implementation for you since the way it reads from the database and makes data available through the IDataView is provided out-of-the-box by the [ML.NET](https://dot.net/ml) framework so you just need to specify your database connection string, what’s the SQL statement for the dataset columns and what’s the data-class to use when loading the data. It is that simple!
23+
24+
Here’s example code on how easily you can now configure your code to load data directly from a relational database into an IDataView which will be used later on when training your model.
25+
26+
```cs --source-file ./DatabaseLoaderConsoleApp/Program.cs --project ./SentimentAnalysis/SentimentAnalysisConsoleApp/SentimentAnalysisConsoleApp.csproj --editable false --region step1to3
27+
28+
var mlContext = new MLContext();
29+
30+
// The following is a connection string using a localdb SQL database,
31+
// but you can also use connection strings against on-premises SQL Server, Azure SQL Database
32+
// or any other relational database (Oracle, SQLite, PostgreSQL, MySQL, Progress, IBM DB2, etc.)
33+
34+
// localdb SQL database connection string using a filepath to attach the database file into localdb
35+
string dbFilePath = Path.Combine(AppDomain.CurrentDomain.BaseDirectory, "SqlLocalDb", "Criteo-100k-rows.mdf");
36+
string connectionString = $"Data Source = (LocalDB)\\MSSQLLocalDB;AttachDbFilename={dbFilePath};Database=Criteo-100k-rows;Integrated Security = True";
37+
38+
string commandText = "SELECT * from URLClicks";
39+
40+
DatabaseLoader loader = mlContext.Data.CreateDatabaseLoader<UrlClick>();
41+
42+
DatabaseSource dbSource = new DatabaseSource(SqlClientFactory.Instance,
43+
connectionString,
44+
commandText);
45+
46+
IDataView dataView = loader.Load(dbSource);
47+
48+
// From this point you can use the IDataView for training and validating an ML.NET model as in any other sample
49+
```
50+
51+
Check the rest of the sample training and evaluating an ML.NET model in the **program.cs** file.
52+

0 commit comments

Comments
 (0)