AML: When using ParallelRun with a Tabular Dataset, is the delimiter always a `space` - clarify documenation/example

@keijik @cody-dkdc @gregce

In [this example notebook](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/machine-learning-pipelines/parallel-run/tabular-dataset-inference-iris.ipynb) you show the delimter in the file that is written to as being a space.  Using space a delimiter seems like a really dangerous choice.  Can you change what the delimiter is?  If so how?  It seems like from this example that this is the default delimiter for tabular datasets which seem problematic.


![image](https://user-images.githubusercontent.com/1483922/119240979-54192a80-bb08-11eb-9b48-246650a3f5ed.png)

The scoring script for this [example is here](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/machine-learning-pipelines/parallel-run/Code/iris_score.py) as you can see, space is not indicated anywhere in the scoring script, so how does this delimiter come out?  If this is the default delimiter, I think this is worth explaining.

[iris_score.py](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/machine-learning-pipelines/parallel-run/Code/iris_score.py)

```py
import io
import pickle
import argparse
import numpy as np

from azureml.core.model import Model
from sklearn.linear_model import LogisticRegression

from azureml_user.parallel_run import EntryScript


def init():
    global iris_model

    logger = EntryScript().logger
    logger.info("init() is called.")

    parser = argparse.ArgumentParser(description="Iris model serving")
    parser.add_argument('--model_name', dest="model_name", required=True)
    args, unknown_args = parser.parse_known_args()

    model_path = Model.get_model_path(args.model_name)
    with open(model_path, 'rb') as model_file:
        iris_model = pickle.load(model_file)


def run(input_data):
    logger = EntryScript().logger
    logger.info("run() is called with: {}.".format(input_data))

    # make inference
    num_rows, num_cols = input_data.shape
    pred = iris_model.predict(input_data).reshape((num_rows, 1))

    # cleanup output
    result = input_data.drop(input_data.columns[4:], axis=1)
    result['variety'] = pred

    return result
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

AML: When using ParallelRun with a Tabular Dataset, is the delimiter always a `space` - clarify documenation/example #1486

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

AML: When using ParallelRun with a Tabular Dataset, is the delimiter always a space - clarify documenation/example #1486

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

AML: When using ParallelRun with a Tabular Dataset, is the delimiter always a `space` - clarify documenation/example #1486