diff --git a/_observing-your-data/ad/index.md b/_observing-your-data/ad/index.md index e226842057b..a3e0c410ec4 100644 --- a/_observing-your-data/ad/index.md +++ b/_observing-your-data/ad/index.md @@ -28,30 +28,22 @@ To get started, go to **OpenSearch Dashboards** > **OpenSearch Plugins** > **Ano A _detector_ is an individual anomaly detection task. You can define multiple detectors, and all detectors can run simultaneously, with each analyzing data from different sources. You can define a detector by following these steps: 1. On the **Anomaly detection** page, select the **Create detector** button. -2. On the **Define detector** page, enter the required information in the **Detector details** pane. -3. In the **Select data** pane, specify the data source by choosing a source from the **Index** dropdown menu. You can choose an index, index patterns, or an alias. -4. (Optional) Filter the data source by selecting **Add data filter** and then entering the conditions for **Field**, **Operator**, and **Value**. Alternatively, you can choose **Use query DSL** and add your JSON filter query. Only [Boolean queries]({{site.url}}{{site.baseurl}}/query-dsl/compound/bool/) are supported for query domain-specific language (DSL). +2. On the **Define detector** page, add the detector details. Enter a name and a brief description. The name must be unique and descriptive enough to help you identify the detector's purpose. -### Example: Filtering data using query DSL +3. In the **Select data** pane, specify the data source by choosing one or more sources from the **Index** dropdown menu. You can select indexes, index patterns, or aliases. -The following example query retrieves documents in which the `urlPath.keyword` field matches any of the specified values. To set up the detector, use the following steps. + - Detectors can use remote indexes, which you can access using the `cluster-name:index-name` pattern. For more information, see [Cross-cluster search]({{site.url}}{{site.baseurl}}/search-plugins/cross-cluster-search/). Starting in OpenSearch Dashboards 2.17, you can also select clusters and indexes directly. If the Security plugin is enabled, see [Selecting remote indexes with fine-grained access control]({{site.url}}{{site.baseurl}}/observing-your-data/ad/security/#selecting-remote-indexes-with-fine-grained-access-control) in the [Anomaly detection security]({{site.url}}{{site.baseurl}}/observing-your-data/ad/security/) documentation. -#### Setting the initial detector settings + - To create a cross-cluster detector in OpenSearch Dashboards, you must have the following [permissions]({{site.url}}{{site.baseurl}}/security/access-control/permissions/): `indices:data/read/field_caps`, `indices:admin/resolve/index`, and `cluster:monitor/remote/info`. -1. Choose **Create detector**. -1. Add the detector details. Enter a name and brief description. Make sure the name is unique and descriptive enough to help you identify the purpose of the detector. -1. Specify the data source. - - For **Data source**, choose one or more indexes to use as the data source. Alternatively, you can use an alias or index pattern to choose multiple indexes, similarly to the following: - - /domain/{id}/short - - /sub_dir/{id}/short - - /abcd/123/{id}/xyz - - Detectors can use remote indexes. You can access them using the `cluster-name:index-name` pattern. See [Cross-cluster search]({{site.url}}{{site.baseurl}}/search-plugins/cross-cluster-search/) for more information. Alternatively, you can select clusters and indexes in OpenSearch Dashboards 2.17 or later. To learn about configuring remote indexes with the Security plugin enabled, see [Selecting remote indexes with fine-grained access control]({{site.url}}{{site.baseurl}}/observing-your-data/ad/security/#selecting-remote-indexes-with-fine-grained-access-control) in the [Anomaly detection security](observing-your-data/ad/security/) documentation. - - (Optional) For **Data filter**, filter the index you chose as the data source. From the **Data filter** menu, choose **Add data filter**, and then design your filter query by selecting **Field**, **Operator**, and **Value**, or choose **Use query DSL** and add your own JSON filter query. Only [Boolean queries]({{site.url}}{{site.baseurl}}/query-dsl/compound/bool/) are supported for query DSL. The following example `bool` query shows you how to use query DSL: +4. (Optional) Filter the data source by selecting **Add data filter** and then specifying the conditions for **Field**, **Operator**, and **Value**. Alternatively, select **Use query DSL** and enter your filter as a JSON-formatted [Boolean query]({{site.url}}{{site.baseurl}}/query-dsl/compound/bool/). Only Boolean queries are supported for query domain-specific language (DSL). -To create a cross-cluster detector in OpenSearch Dashboards, the following [permissions]({{site.url}}{{site.baseurl}}/security/access-control/permissions/) are required: `indices:data/read/field_caps`, `indices:admin/resolve/index`, and `cluster:monitor/remote/info`. -{: .note} - + + +### Example: Filtering data using query DSL + +The following example query retrieves documents in which the `urlPath.keyword` field matches any of the specified values: ```json { diff --git a/_observing-your-data/forecast/api.md b/_observing-your-data/forecast/api.md new file mode 100644 index 00000000000..e1c0285a0f2 --- /dev/null +++ b/_observing-your-data/forecast/api.md @@ -0,0 +1,1280 @@ +--- +layout: default +title: Forecasting API +parent: Forecasting +nav_order: 100 +--- + +# Forecasting API + +Use these operations to programmatically create and manage forecasters that generate forecasts over your time‑series data. + +--- + +## Table of contents +- TOC +{:toc} + +--- + +## Create forecaster + +**Introduced 3.1** +{: .label .label-purple } + +Creates a forecaster for generating time-series forecasts. A forecaster can be either single-stream (without a category field) or high-cardinality (with one or more category fields). + +When creating a forecaster, you define the source indexes, the forecast interval and horizon, the feature to forecast, and optional parameters such as category fields and a custom result index. + + +### Endpoint + +``` +POST _plugins/_forecast/forecasters +``` + +### Request body fields + +This API supports the following request body fields. + +| Field | Data type | Required | Description | +| :---------------------------- | :------------------ | :------- | :------------------------------------------------------------------------------------------------------------------------------------------- | +| `name` | String | Required | The forecaster name. | +| `description` | String | Optional | A free-form description of the forecaster. | +| `time_field` | String | Required | The timestamp field for the source documents. | +| `indices` | String or string\[] | Required | One or more source indexes or index aliases. | +| `feature_attributes` | Array of objects | Required | The feature to forecast. Only one feature is supported. Each object must include the `feature_name` and an `aggregation_query`. | +| `forecast_interval` | Object | Required | The interval over which forecasts are generated. | +| `horizon` | Integer | Optional | The number of future intervals to forecast. | +| `window_delay` | Object | Optional | A delay added to account for ingestion latency. | +| `category_field` | String | Optional | One or two fields used to group forecasts by entity. | +| `result_index` | String | Optional | A custom index alias for storing forecast results. Must begin with `opensearch-forecast-result-`. Defaults to `opensearch-forecast-results`. | +| `suggested_seasonality` | Integer | Optional | The seasonal pattern length in intervals. Expected range: 8–256. | +| `recency_emphasis` | Integer | Optional | Controls how much recent data affects the forecast. Defaults to `2560`. | +| `history` | Integer | Optional | The number of past intervals used for model training. | +| `result_index_min_size` | Integer | Optional | The minimum primary shard size (in MB) required to trigger index rollover. | +| `result_index_min_age` | Integer | Optional | The minimum index age (in days) required to trigger index rollover. | +| `result_index_ttl` | Integer | Optional | The minimum amount of time (in days) before rolled-over indexes are deleted. | +| `flatten_custom_result_index` | Boolean | Optional | If `true`, flattens nested fields in the custom result index for easier aggregation. | +| `shingle_size` | Integer | Optional | The number of past intervals used to influence the forecast. Defaults to `8`. Recommended range: 4–128. | + + +### Example request: Single-stream forecaster + +The following example creates a single-stream forecaster for the `network-requests` index. The forecaster predicts the maximum value of the `deny` field every 3 minutes, using the previous 300 intervals for training. The `window_delay` setting accounts for ingest latency by delaying the forecast window by 3 minutes: + + +```json +POST _plugins/_forecast/forecasters +{ + "name": "Second-Test-Forecaster-7", + "description": "ok rate", + "time_field": "@timestamp", + "indices": [ + "network-requests" + ], + "feature_attributes": [ + { + "feature_id": "deny_max", + "feature_name": "deny max", + "feature_enabled": true, + "importance": 1, + "aggregation_query": { + "deny_max": { + "max": { + "field": "deny" + } + } + } + } + ], + "window_delay": { + "period": { + "interval": 3, + "unit": "MINUTES" + } + }, + "forecast_interval": { + "period": { + "interval": 3, + "unit": "MINUTES" + } + }, + "schema_version": 2, + "horizon": 3, + "history": 300 +} +``` +{% include copy-curl.html %} + +#### Example response + +```json +{ + "_id": "4WnXAYoBU2pVBal92lXD", + "_version": 1, + "forecaster": { + "...": "Configuration (omitted)" + } +} +``` + +### Example request: High-cardinality forecaster + +The following example creates a high-cardinality forecaster that groups forecasts by the `host_nest.host2` field. Like the single-stream example, it forecasts the maximum value of the `deny` field at 3-minute intervals using historical data. This setup enables entity-specific forecasting across different hosts: + +```json +POST _plugins/_forecast/forecasters +{ + "name": "Second-Test-Forecaster-7", + "description": "ok rate", + "time_field": "@timestamp", + "indices": [ + "network-requests" + ], + "feature_attributes": [ + { + "feature_id": "deny_max", + "feature_name": "deny max", + "feature_enabled": true, + "importance": 1, + "aggregation_query": { + "deny_max": { + "max": { + "field": "deny" + } + } + } + } + ], + "window_delay": { + "period": { + "interval": 3, + "unit": "MINUTES" + } + }, + "forecast_interval": { + "period": { + "interval": 3, + "unit": "MINUTES" + } + }, + "schema_version": 2, + "horizon": 3, + "history": 300, + "category_field": ["host_nest.host2"], +} +``` +{% include copy-curl.html %} + +#### Example response + +```json +{ + "_id": "4WnXAYoBU2pVBal92lXD", + "_version": 1, + "forecaster": { + "...": "Configuration (omitted)" + } +} +``` + + +--- + + +## Validate forecaster + +**Introduced 3.1** +{: .label .label-purple } + +Use this API to verify that a forecaster configuration is valid. You can perform two types of validation: + +- **Configuration-only validation**: Checks that the configuration is syntactically correct and references existing fields. +- **Training-feasibility validation**: Performs a comprehensive validation to ensure that the forecaster can be trained with the specified configuration. + + +### Endpoints + +The following endpoints are available for validating forecasters. + +**Configuration-only validation**: + +```http +POST _plugins/_forecast/forecasters/_validate +``` + +**Training-feasibility validation**: + +```http +POST _plugins/_forecast/forecasters/_validate/model +``` + +### Request body + +The request body is identical to the request body used to create a forecaster. It must include at least the following required fields: `name`, `time_field`, `indices`, `feature_attributes`, and `forecast_interval`. + +If the configuration is valid, the response returns an empty object (`{}`). If the configuration is invalid, the response includes detailed error messages. + + +### Example request: Missing `forecast_interval` + +The following request shows an invalid forecaster configuration that omits the `forecast_interval`: + +```json +POST _plugins/_forecast/forecasters/_validate +{ + "name": "invalid-forecaster", + "time_field": "@timestamp", + "indices": ["network-requests"], + "feature_attributes": [ + { + "feature_id": "deny_max", + "feature_name": "deny max", + "feature_enabled": true, + "aggregation_query": { + "deny_max": { + "max": { + "field": "deny" + } + } + } + } + ] +} +``` +{% include copy-curl.html %} + +### Example response + +```json +{ + "forecaster": { + "forecast_interval": { + "message": "Forecast interval should be set" + } + } +} +``` + + +--- + +## Suggest configuration + +**Introduced 3.1** +{: .label .label-purple } + +Returns appropriate values for one or more forecaster parameters (`forecast_interval`, `horizon`, `history`, `window_delay`) based on the cadence and density of your data. + + +### Endpoints + +``` +POST _plugins/_forecast/forecasters/_suggest/ +``` + +`types` must be one or more of `forecast_interval`, `horizon`, `history`, or `window_delay`. + + +### Example request: Suggest an interval + +The following request analyzes the source data and suggests an appropriate `forecast_interval` value for the forecaster based on the average event frequency: + +``` +POST _plugins/_forecast/forecasters/_suggest/forecast_interval +{ + "name": "interval‑suggest", + "time_field": "@timestamp", + "indices": ["network-requests"], + ... +} +``` +{% include copy-curl.html %} + +#### Example response + +```json +{ + "interval": { + "period": { "interval": 1, "unit": "Minutes" } + } +} +``` + +--- + +## Get forecaster + +**Introduced 3.1** +{: .label .label-purple } + +Retrieves a forecaster and (optionally) its most recent tasks. + +### Endpoints + +``` +GET _plugins/_forecast/forecasters/[?task=(true|false)] +``` + +### Example request: Include tasks + +The following request returns metadata about the forecaster and, if specified, details about its associated tasks: + +```json +GET _plugins/_forecast/forecasters/d7-r1YkB_Z-sgDOKo3Z5?task=true +``` +{% include copy-curl.html %} + +The response includes the `forecaster`, `realtime_task`, and `run_once_task` sections. + +--- + +## Update forecaster + +**Introduced 3.1** +{: .label .label-purple } + +Updates the configuration of an existing forecaster. You must stop any active forecasting jobs before making updates. + +Any change that affects the model, such as modifying the `category_field`, `result_index`, or `feature_attributes`, invalidates previous results shown in the OpenSearch Dashboards UI. + +### Endpoints + +``` +PUT _plugins/_forecast/forecasters/ +``` + + +### Example request: Update the name, result index, and category fields + +The following displays the definition of forecaster `forecaster-i1nwqooBLXq6T-gGbXI-`: + +```json +{ + "_index": ".opensearch-forecasters", + "_id": "forecaster-i1nwqooBLXq6T-gGbXI-", + "_version": 1, + "_seq_no": 0, + "_primary_term": 1, + "_score": 1.0, + "_source": { + "category_field": [ + "service" + ], + "description": "ok rate", + "feature_attributes": [{ + "feature_id": "deny_max", + "feature_enabled": true, + "feature_name": "deny max", + "aggregation_query": { + "deny_max": { + "max": { + "field": "deny" + } + } + } + }], + "forecast_interval": { + "period": { + "unit": "Minutes", + "interval": 1 + } + }, + "schema_version": 2, + "time_field": "@timestamp", + "last_update_time": 1695084997949, + "horizon": 24, + "indices": [ + "network-requests" + ], + "window_delay": { + "period": { + "unit": "Seconds", + "interval": 20 + } + }, + "transform_decay": 1.0E-4, + "name": "Second-Test-Forecaster-3", + "filter_query": { + "match_all": { + "boost": 1.0 + } + }, + "shingle_size": 8, + "result_index": "opensearch-forecast-result-a" + } +} +``` + +The following request updates the `name`, `result_index`, and `category_field` properties of a forecaster: + +```json +PUT localhost:9200/_plugins/_forecast/forecasters/forecast-i1nwqooBLXq6T-gGbXI- +{ + "name": "Second-Test-Forecaster-1", + "description": "ok rate", + "time_field": "@timestamp", + "indices": [ + "network-requests" + ], + "feature_attributes": [ + { + "feature_id": "deny_max", + "feature_name": "deny max", + "feature_enabled": true, + "importance": 1, + "aggregation_query": { + "deny_max": { + "max": { + "field": "deny" + } + } + } + } + ], + "window_delay": { + "period": { + "interval": 20, + "unit": "SECONDS" + } + }, + "forecast_interval": { + "period": { + "interval": 1, + "unit": "MINUTES" + } + }, + "ui_metadata": { + "aabb": { + "ab": "bb" + } + }, + "schema_version": 2, + "horizon": 24, + "category_field": ["service", "host"] +} +``` +{% include copy-curl.html %} + +--- + + +## Delete forecaster + +**Introduced 3.1** +{: .label .label-purple } + +Deletes a forecaster configuration. You must stop any associated real-time or run-once forecasting jobs before deletion. If a job is still running, the API returns a `400` error. + +### Endpoint + +```http +DELETE _plugins/_forecast/forecasters/ +``` + +### Example request: Delete a forecaster + +The following request deletes a forecaster configuration using its unique ID: + +```http +DELETE _plugins/_forecast/forecasters/forecast-i1nwqooBLXq6T-gGbXI- +``` +{% include copy-curl.html %} + +--- + +## Start a forecaster job + +**Introduced 3.1** +{: .label .label-purple } + +Begins real-time forecasting for a forecaster. + +### Endpoints + +```http +POST _plugins/_forecast/forecasters//_start +``` + +### Example request: Start a forecaster job + +The following request initiates real-time forecasting for the specified forecaster: + +```bash +POST _plugins/_forecast/forecasters/4WnXAYoBU2pVBal92lXD/_start +``` +{% include copy-curl.html %} + +#### Example response + +```json +{ "_id": "4WnXAYoBU2pVBal92lXD" } +``` + +--- + +## Stop a forecaster job + +**Introduced 3.1** +{: .label .label-purple } + +Stops real-time forecasting for a forecaster. + +### Endpoints +```http +POST _plugins/_forecast/forecasters//_stop +``` + +### Example request: Stop a forecaster job + +The following request stops the real-time forecasting job for the specified forecaster: + +```bash +POST _plugins/_forecast/forecasters/4WnXAYoBU2pVBal92lXD/_stop +``` +{% include copy-curl.html %} + + +--- + +## Run one analysis + +**Introduced 3.1** +{: .label .label-purple } + +Runs backtesting (historical) forecasting. It cannot run while a real-time job is active. + +### Endpoint +```http +POST _plugins/_forecast/forecasters//_run_once +``` + +### Example request: Run a backtesting forecast + +The following request starts a run-once forecast analysis for the specified forecaster: + +```bash +POST _plugins/_forecast/forecasters//_run_once +``` +{% include copy-curl.html %} + +#### Example response + +The response returns the task ID assigned to the run-once job: + +```json +{ "taskId": "vXZG85UBAlM4LplcKI0f" } +``` + +### Example request: Search forecast results by task ID + +Use the returned `taskId` to query the `opensearch-forecast-results*` index for historical forecast output: + +```json +GET opensearch-forecast-results*/_search?pretty +{ + "sort": { + "data_end_time": "desc" + }, + "size": 10, + "query": { + "bool": { + "filter": [ + { "term": { "task_id": "vXZG85UBAlM4LplcKI0f" } }, + { + "range": { + "data_end_time": { + "format": "epoch_millis", + "gte": 1742585746033 + } + } + } + ] + } + }, + "track_total_hits": true +} +``` +{% include copy-curl.html %} + +This query returns the 10 most recent forecast results matching the specified task ID. + + +--- + +## Search forecasters + +**Introduced 3.1** +{: .label .label-purple } + +Provides standard `_search` functionality on the `.opensearch-forecasters` system index, which stores forecaster configurations. You must use this API to query `.opensearch-forecasters` directly because the index is a system index and cannot be accessed through regular OpenSearch queries. + +### Endpoint + +```http +GET _plugins/_forecast/forecasters/_search +``` + +### Example request: Wildcard search by index + +The following request searches for forecasters whose source index names begin with `network` using a leading-anchored wildcard: + +```json +GET _plugins/_forecast/forecasters/_search +{ + "query": { + "wildcard": { + "indices": { + "value": "network*" + } + } + } +} +``` +{% include copy-curl.html %} + +`network*` matches `network`, `network-metrics`, `network_2025-06`, and similar index names. + +#### Example response + +```json +{ + "took": 5, + "timed_out": false, + "_shards": { + "total": 1, + "successful": 1, + "skipped": 0, + "failed": 0 + }, + "hits": { + "total": { + "value": 1, + "relation": "eq" + }, + "max_score": 1.0, + "hits": [{ + "_index": ".opensearch-forecasters", + "_id": "forecast-i1nwqooBLXq6T-gGbXI-", + "_version": 1, + "_seq_no": 0, + "_primary_term": 1, + "_score": 1.0, + "_source": { + "category_field": ["server"], + "description": "ok rate", + "feature_attributes": [{ + "feature_id": "deny_max", + "feature_enabled": true, + "feature_name": "deny max", + "aggregation_query": { + "deny_max": { + "max": { + "field": "deny" + } + } + } + }], + "forecast_interval": { + "period": { + "unit": "Minutes", + "interval": 1 + } + }, + "schema_version": 2, + "time_field": "@timestamp", + "last_update_time": 1695084997949, + "horizon": 24, + "indices": ["network-requests"], + "window_delay": { + "period": { + "unit": "Seconds", + "interval": 20 + } + }, + "transform_decay": 1.0E-4, + "name": "Second-Test-Forecaster-3", + "filter_query": { + "match_all": { + "boost": 1.0 + } + }, + "shingle_size": 8 + } + }] + } +} +``` + +--- + +## Search tasks + +**Introduced 3.1** +{: .label .label-purple } + +Query tasks in the `.opensearch-forecast-state` index. + +### Endpoint + +```http +GET _plugins/_forecast/forecasters/tasks/_search +``` + +### Example request: Search previous run-once tasks + +The following request retrieves previous run-once tasks (excluding the most recent) for a specific forecaster and sorts them by `execution_start_time` in descending order: + +```json +GET _plugins/_forecast/forecasters/tasks/_search +{ + "from": 0, + "size": 1000, + "query": { + "bool": { + "filter": [ + { "term": { "forecaster_id": { "value": "m5apnooBHh7Wss2wewfW", "boost": 1.0 }}}, + { "term": { "is_latest": { "value": false, "boost": 1.0 }}}, + { "terms": { + "task_type": [ + "RUN_ONCE_FORECAST_SINGLE_STREAM", + "RUN_ONCE_FORECAST_HC_FORECASTER" + ], + "boost": 1.0 + }} + ], + "adjust_pure_negative": true, + "boost": 1.0 + } + }, + "sort": [ + { "execution_start_time": { "order": "desc" }} + ] +} +``` +{% include copy-curl.html %} + +#### Example response + +```json +{ + "took": 3, + "timed_out": false, + "_shards": { + "total": 1, + "successful": 1, + "skipped": 0, + "failed": 0 + }, + "hits": { + "total": { "value": 1, "relation": "eq" }, + "max_score": null, + "hits": [ + { + "_index": ".opensearch-forecast-state", + "_id": "4JaunooBHh7Wss2wOwcw", + "_version": 3, + "_seq_no": 5, + "_primary_term": 1, + "_score": null, + "_source": { + "last_update_time": 1694879344264, + "execution_start_time": 1694879333168, + "forecaster_id": "m5apnooBHh7Wss2wewfW", + "state": "TEST_COMPLETE", + "task_type": "RUN_ONCE_FORECAST_SINGLE_STREAM", + "is_latest": false, + "forecaster": { + "description": "ok rate", + "ui_metadata": { "aabb": { "ab": "bb" }}, + "feature_attributes": [ + { + "feature_id": "deny_max", + "feature_enabled": true, + "feature_name": "deny max", + "aggregation_query": { + "deny_max": { + "max": { "field": "deny" } + } + } + } + ], + "forecast_interval": { + "period": { + "unit": "Minutes", + "interval": 1 + } + }, + "schema_version": 2, + "time_field": "@timestamp", + "last_update_time": 1694879022036, + "horizon": 24, + "indices": [ "network-requests" ], + "window_delay": { + "period": { + "unit": "Seconds", + "interval": 20 + } + }, + "transform_decay": 1.0E-4, + "name": "Second-Test-Forecaster-5", + "filter_query": { "match_all": { "boost": 1.0 }}, + "shingle_size": 8 + } + }, + "sort": [ 1694879333168 ] + } + ] + } +} +``` + +--- + +## Top forecasters +**Introduced 3.1** +{: .label .label-purple } + +Returns the *top‑k* entities for a given timestamp range, based on built‑in or custom metrics. + +### Endpoint + +```http +POST _plugins/_forecast/forecasters//results/_topForecasts +``` + +### Query parameters + +The following query parameters are supported. + +| Name | Type | Required | Description | +| :--- | :--- | :--- | :--- | +| `split_by` | String | Required | The field to group by (such as `service`). | +| `forecast_from` | Epoch‑ms | Required | The `data_end_time` of the first forecast in the evaluation window. | +| `size` | Integer | Optional | The number of buckets to return. Defaults is `5`. | +| `filter_by` | Enum | Required | Specifies whether to use a built-in or custom query. Must be either `BUILD_IN_QUERY` or `CUSTOM_QUERY`. | +| `build_in_query` | Enum | Optional | One of the following built-in ranking criteria is required:
`MIN_CONFIDENCE_INTERVAL_WIDTH` -- Sorts by the narrowest forecast confidence intervals (most precise).
`MAX_CONFIDENCE_INTERVAL_WIDTH` -- Sorts by the widest forecast confidence intervals (least precise).
`MIN_VALUE_WITHIN_THE_HORIZON` -- Sorts by the lowest forecast value observed within the prediction window.
`MAX_VALUE_WITHIN_THE_HORIZON` -- Sorts by the highest forecast value observed within the prediction window.
`DISTANCE_TO_THRESHOLD_VALUE` -- Sorts by the difference between the forecast value and a user-defined threshold. | +| `threshold`, `relation_to_threshold` | Mixed | Conditional | Required only if `build_in_query` is `DISTANCE_TO_THRESHOLD_VALUE`. | +| `filter_query` | Query DSL | Optional | A custom query used when `filter_by=CUSTOM_QUERY`. | +| `subaggregations` | Array | Optional | A list of nested aggregations and sort options used to compute additional metrics within each bucket. | + +### Example request: Built-in query for narrow confidence intervals + +The following request returns the top forecasted entities ranked by the narrowest confidence intervals: + +```json +POST _plugins/_forecast/forecasters/AG_3t4kBkYqqimCe86bP/results/_topForecasts +{ + "split_by": "service", + "filter_by": "BUILD_IN_QUERY", + "build_in_query": "MIN_CONFIDENCE_INTERVAL_WIDTH", + "forecast_from": 1691008679297 +} +``` +{% include copy-curl.html %} + +#### Example response + +```json +{ + "buckets": [ + { + "key": { "service": "service_6" }, + "doc_count": 1, + "bucket_index": 0, + "MIN_CONFIDENCE_INTERVAL_WIDTH": 27.655361 + }, + ... + ] +} +``` + +### Example request: Built-in query with the narrowest confidence interval + +The following request returns a sorted list of entities whose forecast values have the narrowest confidence intervals. The results are ranked in ascending order based on the `MIN_CONFIDENCE_INTERVAL_WIDTH` metric: + +```json +POST _plugins/_forecast/forecasters/AG_3t4kBkYqqimCe86bP/results/_topForecasts +{ + "split_by": "service", + "filter_by": "BUILD_IN_QUERY", + "build_in_query": "MIN_CONFIDENCE_INTERVAL_WIDTH", + "forecast_from": 1691008679297 +} +``` +{% include copy-curl.html %} + +#### Example response + +```json +{ + "buckets": [ + { + "key": { + "service": "service_6" + }, + "doc_count": 1, + "bucket_index": 0, + "MIN_CONFIDENCE_INTERVAL_WIDTH": 27.655361 + }, + { + "key": { + "service": "service_4" + }, + "doc_count": 1, + "bucket_index": 1, + "MIN_CONFIDENCE_INTERVAL_WIDTH": 1324.7734 + }, + { + "key": { + "service": "service_0" + }, + "doc_count": 1, + "bucket_index": 2, + "MIN_CONFIDENCE_INTERVAL_WIDTH": 2211.0781 + }, + { + "key": { + "service": "service_2" + }, + "doc_count": 1, + "bucket_index": 3, + "MIN_CONFIDENCE_INTERVAL_WIDTH": 3372.0469 + }, + { + "key": { + "service": "service_3" + }, + "doc_count": 1, + "bucket_index": 4, + "MIN_CONFIDENCE_INTERVAL_WIDTH": 3980.2812 + } + ] +} +``` + +### Example request: Built-in query with distance under a threshold + +The following request returns the top entities whose forecast values fall farthest from a specified threshold, based on the `DISTANCE_TO_THRESHOLD_VALUE` metric: + +```http +POST _plugins/_forecast/AG_3t4kBkYqqimCe86bP/results/_topForecasts +{ + "split_by": "service", // group forecasts by the "service" entity field + "filter_by": "BUILD_IN_QUERY", // use a built-in ranking metric + "build_in_query": "DISTANCE_TO_THRESHOLD_VALUE", + "forecast_from": 1691008679297, // data_end_time of the first forecast in scope + "threshold": -82561.8, // user-supplied threshold + "relation_to_threshold": "LESS_THAN" // keep only forecasts below the threshold +} +``` + +#### Example response + +The `DISTANCE_TO_THRESHOLD_VALUE` metric calculates `forecast_value – threshold`. Because `relation_to_threshold` is `LESS_THAN`, the API returns negative distances only and sorts them in ascending order (most negative first). Each bucket includes the following values: + +- `doc_count`: The number of forecast points that matched. +- `DISTANCE_TO_THRESHOLD_VALUE`: The largest distance within the forecast horizon from the threshold value. + +The following response returns the `DISTANCE_TO_THRESHOLD_VALUE`: + +```json +{ + "buckets": [ + { + "key": { "service": "service_5" }, + "doc_count": 18, + "bucket_index": 0, + "DISTANCE_TO_THRESHOLD_VALUE": -330387.12 + }, + ... + { + "key": { "service": "service_0" }, + "doc_count": 1, + "bucket_index": 4, + "DISTANCE_TO_THRESHOLD_VALUE": -83561.8 + } + ] +} +``` + +### Example request: Custom query and nested aggregations + +The following request uses a custom query to match services by name and ranks them by the highest forecast value: + +```json +POST _plugins/_forecast/AG_3t4kBkYqqimCe86bP/results/_topForecasts +{ + "split_by": "service", + "forecast_from": 1691018993776, + "filter_by": "CUSTOM_QUERY", + "filter_query": { + "nested": { + "path": "entity", + "query": { + "bool": { + "must": [ + { "term": { "entity.name": "service" } }, + { "wildcard": { "entity.value": "User*" } } + ] + } + } + } + }, + "subaggregations": [ + { + "aggregation_query": { + "forecast_value_max": { + "max": { "field": "forecast_value" } + } + }, + "order": "DESC" + } + ] +} +``` +{% include copy-curl.html %} + +#### Example response + +```json +{ + "buckets": [ + { + "key": { "service": "UserAuthService" }, + "doc_count": 24, + "bucket_index": 0, + "forecast_value_max": 269190.38 + }, + ... + ] +} +``` +--- + +## Profile forecaster + +**Introduced 3.1** +{: .label .label-purple } + +Returns execution-time state such as initialization progress, per-entity model metadata, and errors. This API is useful for inspecting forecaster intervals during runtime. + +### Endpoints + +```http +GET _plugins/_forecast/forecasters//_profile[/,][?_all=true] +``` + +You can retrieve specific profile types or request all available types using the `_all` query parameter. + +The following profile types are supported: + +- `state` +- `error` +- `coordinating_node` +- `total_size_in_bytes` +- `init_progress` +- `models` +- `total_entities` +- `active_entities` +- `forecast_task` + +If you include an `entity` array in the request body, the profile is scoped to that entity only. + +### Example request: Default profile with an entity filter + +The following request returns the default profile types (`state` and `error`) for the specified entity: + +```http +GET _plugins/_forecast/forecasters/tLch1okBCBjX5EchixQ8/_profile +{ + "entity": [ + { + "name": "service", + "value": "app_1" + }, + { + "name": "host", + "value": "server_2" + } + ] +} +``` +{% include copy-curl.html %} + +#### Example response + +```json +{ + "state": "RUNNING" +} +``` + +### Example request: Multiple profile types + +The following request retrieves `init_progress`, `error`, `total_entities`, and `state` profile types: + +```http +GET _plugins/_forecast/forecasters/mZ6P0okBTUNS6IWgvpwo/_profile/init_progress,error,total_entities,state +``` +{% include copy-curl.html %} + +### Example request: All profile types + +The following request returns all available profile types: + +```http +GET _plugins/_forecast/forecasters/d7-r1YkB_Z-sgDOKo3Z5/_profile?_all=true&pretty +``` +{% include copy-curl.html %} + + +--- + +## Forecaster stats +**Introduced 3.1** +{: .label .label-purple } + +Returns cluster-level or node-level statistics, including the number of forecasters, model counts, request counters, and the health of internal forecast indexes. + +### Endpoints + +```http +GET _plugins/_forecast/stats +GET _plugins/_forecast//stats +GET _plugins/_forecast/stats/ +``` + +### Example request: Retrieve all statistics + +The following request retrieves cluster-level statistics for all forecasters, including counts, model information, and index status: + +```http +GET _plugins/_forecast/stats +``` +{% include copy-curl.html %} + +#### Example response + +```json +{ + "hc_forecaster_count": 1, + "forecast_results_index_status": "yellow", + "forecast_models_checkpoint_index_status": "yellow", + "single_stream_forecaster_count": 1, + "forecastn_state_status": "yellow", + "forecaster_count": 2, + "job_index_status": "yellow", + "config_index_status": "yellow", + "nodes": { + "8B2S4ClnRFK3GTjO45bwrw": { + "models": [ + { + "model_type": "rcf_caster", + "last_used_time": 1692245336895, + "model_id": "Doj0AIoBEU5Xd2ccoe_9_entity_SO2kPi_PAMsvThWyE-zYHg", + "last_checkpoint_time": 1692233157256, + "entity": [ + { "name": "host_nest.host2", "value": "server_2" } + ] + } + ], + "forecast_hc_execute_request_count": 204, + "forecast_model_corruption_count": 0, + "forecast_execute_failure_count": 0, + "model_count": 4, + "forecast_execute_request_count": 409, + "forecast_hc_execute_failure_count": 0 + } + } +} +``` + +### Example request: Retrieve statistics for a specific node + +The following request retrieves forecaster statistics for a specific node, identified by node ID: + +```http +GET _plugins/_forecast/8B2S4ClnRFK3GTjO45bwrw/stats +``` +{% include copy-curl.html %} + +### Example request: Retrieve the total number of high-cardinality requests + +The following request retrieves the total number of high-cardinality forecaster requests across all nodes: + +```http +GET _plugins/_forecast/stats/forecast_hc_execute_request_count +``` +{% include copy-curl.html %} + +### Example request: Retrieve the high-cardinality request count for a specific node + +The following request retrieves the number of high-cardinality forecaster requests executed by a specific node: + +```http +GET _plugins/_forecast/0ZpL8WEYShy-qx7hLJQREQ/stats/forecast_hc_execute_request_count/ +``` +{% include copy-curl.html %} + + +--- + +## Forecaster info +**Introduced 3.1** +{: .label .label-purple } + +Returns a single integer representing the total number of forecaster configurations in the cluster or checks whether a forecaster that satisfies a given search criterion exists. + + +### Endpoints +```http +GET _plugins/_forecast/forecasters/count +GET _plugins/_forecast/forecasters/match?name= +``` + +### Example request: Count forecasters + +The following request returns the number of forecaster configurations currently stored in the cluster: + +```http +GET _plugins/_forecast/forecasters/count +``` +{% include copy-curl.html %} + +### Example response + +```json +{ + "count": 2, + "match": false +} +``` + +### Example request: Match forecaster name + +The following request looks for a forecaster named `Second-Test-Forecaster-3`: + +```http +GET _plugins/_forecast/forecasters/match?name=Second-Test-Forecaster-3 +``` +{% include copy-curl.html %} + +### Example response: Match found + +```json +{ + "count": 0, + "match": true +} +``` + +### Example response: No match found + +```json +{ + "count": 0, + "match": false +} +``` diff --git a/_observing-your-data/forecast/getting-started.md b/_observing-your-data/forecast/getting-started.md new file mode 100644 index 00000000000..d2038e8f8e1 --- /dev/null +++ b/_observing-your-data/forecast/getting-started.md @@ -0,0 +1,353 @@ +--- +layout: default +title: Getting started with forecasting +nav_order: 5 +parent: Forecasting +has_children: false +--- + +# Getting started with forecasting + +You can define and configure forecasters in OpenSearch Dashboards by selecting **Forecasting** from the navigation panel. + +## Step 1: Define a forecaster + +A **forecaster** represents a single forecasting task. You can create multiple forecasters to run in parallel, each analyzing a different data source. Follow these steps to define a new forecaster: + +1. In the **Forecaster list** view, choose **Create forecaster**. + +2. Define the data source by entering the following information: + * **Name** – Provide a unique, descriptive name, such as `requests-10min`. + * **Description** – Summarize the forecaster's purpose, for example, `Forecast total request count every 10 minutes`. + * **Indexes** – Select one or more indexes, index patterns, or aliases. Remote indexes are supported through cross-cluster search (`cluster-name:index-pattern`). For more information, see [Cross-cluster search]({{site.url}}{{site.baseurl}}/search-plugins/cross-cluster-search/). If the Security plugin is enabled, see [Selecting remote indexes with fine-grained access control]({{site.url}}{{site.baseurl}}/observing-your-data/forecast/security/#selecting-remote-indexes-with-fine-grained-access-control). + +3. (Optional) Choose **Add data filter** to set a **Field**, **Operator**, and **Value** or choose **Use query DSL** to define a [Boolean query]({{site.url}}{{site.baseurl}}/query-dsl/compound/bool/). The following example uses a query domain-specific language (DSL) filter to match three URL paths: + + ```json + { + "bool": { + "should": [ + { "term": { "urlPath.keyword": "/domain/{id}/short" } }, + { "term": { "urlPath.keyword": "/sub_dir/{id}/short" } }, + { "term": { "urlPath.keyword": "/abcd/123/{id}/xyz" } } + ] + } + } + ``` + + +4. Under **Timestamp field**, select the field that stores the timestamps. + +5. In the **Indicator (metric)** section, add a metric for the forecaster. Each forecaster supports one metric for optimal accuracy. Choose one of the following options: + + - Select a predefined aggregation: `average()`, `count()`, `sum()`, `min()`, or `max()`. + - To use a custom aggregation, choose **Custom expression** under **Forecast based on** and define your own [query DSL]({{site.url}}{{site.baseurl}}/opensearch/query-dsl/index/) expression. For example, the following query forecasts the number of unique accounts with a specific account type: + + ```json + { + "bbb_unique_accounts": { + "filter": { + "bool": { + "must": [ + { + "wildcard": { + "accountType": { + "wildcard": "*blah*", + "boost": 1 + } + } + } + ], + "adjust_pure_negative": true, + "boost": 1 + } + }, + "aggregations": { + "uniqueAccounts": { + "cardinality": { + "field": "account" + } + } + } + } + } + ``` + +6. (Optional) In the **Categorical fields** section, enable **Split time series using categorical fields** to generate forecasts at the entity level (for example, by IP address, product ID, or country code). + + The number of unique entities that can be cached in memory is limited. Use the following formula to estimate capacity: + + ``` + (data nodes × heap size × plugins.forecast.model_max_size_percent) + ────────────────────────────────────────────────────────────────── + entity-model size (MB) + ``` + + For example, a cluster with 3 data nodes, each with 8 GB JVM heap and the default 10% model memory, would contain the following number of entities: + + ``` + (8096 MB × 0.10 ÷ 1 MB) × 3 nodes ≈ 2429 entities + ``` + + To determine the entity-model size, use the [Profile Forecaster API]({{site.url}}{{site.baseurl}}/observing-your-data/forecast/api/#profile-forecaster). You can raise or lower the memory ceiling with the `plugins.forecast.model_max_size_percent` setting. + + +Forecasters cache models for the most frequently and recently observed entities, subject to available memory. Models for less common entities are loaded from indexes on a best-effort basis during each interval, with no guaranteed service-level agreement (SLA). Always validate memory usage against a representative workload. + +For more information, see the blog post [Improving Anomaly Detection: One Million Entities in One Minute](https://opensearch.org/blog/one-million-enitities-in-one-minute/). Although focused on anomaly detection, the recommendations apply to forecasting, as both features share the same underlying Random Cut Forest (RCF) model. + +## Step 2: Add model parameters + +The **Suggest parameters** button in OpenSearch Dashboards initiates a review of recent history to recommend sensible defaults. You can override these defaults by adjusting the following parameters: + +* **Forecasting interval** – Specifies the aggregation bucket (for example, 10 minutes). Longer intervals smooth out noise and reduce compute costs, but they delay detection. Shorter intervals detect changes sooner but increase resource usage and can introduce noise. Choose the shortest interval that still produces a stable signal. +* **Window delay** – Tells the forecaster how much of a delay to expect between event occurrence and ingestion. This delay adjusts the forecasting interval backward to ensure complete data coverage. For example, if the forecasting interval is 10 minutes and ingestion is delayed by 1 minute, setting the window delay to 1 minute ensures that the forecaster evaluates data from 1:49 to 1:59 rather than 1:50 to 2:00. + * To avoid missing data, set the window delay to the upper limit of the expected ingestion delay. However, longer delays reduce the real-time responsiveness of forecasts. +* **Horizon** – Specifies how many future buckets to predict. Forecast accuracy declines with distance, so choose only the forecast window that is operationally meaningful. +* **History** – Sets the number of historical data points used to train the initial (cold-start) model. The maximum is 10,000. More history improves initial model accuracy up to that limit. + +The **Advanced** panel is collapsed by default, allowing most users to proceed with the suggested parameters. If you expand the panel, you can fine-tune three additional parameters: [shingle size](#choosing-a-shingle-size), [suggested seasonality](#choosing-a-shingle-size), and [recency emphasis](#choosing-a-shingle-size). These control how the forecaster balances recent fluctuations against long-term patterns. + +Unless your data or use case demands otherwise, the defaults—**shingle size 8**, **no explicit seasonality**, and **recency emphasis 2560**—are reliable starting points. + +### Choosing a shingle size + +Leave the **Shingle size** field empty to use the automatic heuristic: + +1. Start with the default value of 8. +2. If **Suggested seasonality** is defined and greater than 16, replace it with half the season length. +3. If **Horizon** is defined and one-third of the value is greater than the current candidate, update it accordingly. + +The final value is the maximum of these three: +`max(8, seasonality ÷ 2, horizon ÷ 3)` + +If you provide a custom value, it overrides this calculation. + +### Determining storage amounts + +By default, forecast results are stored in the `opensearch-forecast-results` index alias. You can: + +* Build dashboards and visualizations. +* Connect the results to the Alerting plugin. +* Query the results as with any other OpenSearch index. + +To manage storage, the plugin applies a rollover policy: + +* **Rollover trigger** – When a primary shard reaches approximately 65 GB, a new backing index is created and the alias is updated. +* **Retention** – Rolled-over indexes are retained for at least 30 days before deletion. + +You can customize this behavior using the following settings. + +| Setting | Description | Default | +|---------|-------------|---------| +| `plugins.forecast.forecast_result_history_max_docs_per_shard` | The maximum number of Lucene documents allowed per shard before triggering a rollover. One result is approximately 4 documents at around 47 bytes each, totaling about 65 GB. | `1_350_000_000` | +| `plugins.forecast.forecast_result_history_retention_period` | The duration for which to retain forecast results. Supports duration formats such as `7d`, `90d`. | `30d` | + +### Specifying a custom result index + +You can store forecast results in a custom index by selecting **Custom index** and providing an alias name, such as `abc`. The plugin creates an alias like `opensearch-forecast-result-abc` that points to the backing index (for example, `opensearch-forecast-result-abc-history-2024.06.12-000002`). + +To manage permissions, use hyphenated namespaces. For example, assign `opensearch-forecast-result-financial-us-*` to roles for the `financial` department's `us` group. +{: .note } If the Security plugin is enabled, ensure appropriate [permissions are configured]({{site.url}}{{site.baseurl}}/observing-your-data/forecast/security/#custom-result-index-permissions). + +### Flattening nested fields + +If your custom result index's documents include nested fields, enable the **Flattened custom result index** to simplify aggregation and visualization. + +This creates a separate index prefixed with the custom index and forecaster name (for example, `opensearch-forecast-result-abc-flattened-test`) and attaches an ingest pipeline using a [Painless script](https://github.com/opensearch-project/anomaly-detection/blob/main/src/main/resources/scripts/flatten-custom-result-index-painless.txt) to flatten nested data. + +If you later disable this option, the associated ingest pipeline is removed. + +Use [Index State Management]({{site.url}}{{site.baseurl}}/im-plugin/ism/index/) to manage rollover and deletion of flattened result indexes. + +### Custom result index lifecycle management + +The plugin triggers a rollover for custom result indexes when any of the following conditions are met. + +| Parameter | Description | Type | Unit | Default | Required | +|----------|-------------|------|------|---------|----------| +| `result_index_min_size` | The minimum total primary shard size required to trigger a rollover. | Integer | MB | `51200` (50 GB) | No | +| `result_index_min_age` | The minimum index age required to trigger a rollover. | Integer | Days | `7` | No | +| `result_index_ttl` | The minimum amount of time before rolled-over indexes are deleted | Integer | Days | `60` | No | + + +## Step 3: Test your forecaster + +Backtesting is the fastest way to evaluate and refine key forecasting settings such as **Interval** and **Horizon**. During backtesting, the model is trained on historical data, generates forecasts, and plots them alongside actual values to help visualize prediction accuracy. If the results do not meet expectations, you can adjust the settings and run the test again. + +Backtesting uses the following methods: + +1. **Training window**: The model trains on historical data defined by the **History** setting. + +2. **Rolling forecast**: The model progresses through the time series, repeatedly performing the following actions: + * Ingesting the next actual data point + * Emitting forecasts at each step + + Because this is a retrospective simulation, forecasted values are plotted at their original timestamps, allowing you to see how well the model would have performed in real time. + + +### Starting a backtest + +To begin a test: + +1. Scroll to the bottom of the **Add model parameters** page. +2. Select **Create and test**. + +To skip testing and create the forecaster immediately, select **Create**. + +Backtests usually take 1 or 2 minutes, but run time depends on the following factors. + +| Factor | Why it matters | +| --------------------- | ------------------------------------------------------------------------ | +| **History length** | More historical data increases training time. | +| **Data density** | Densely packed data slows aggregation. | +| **Categorical field** | The model trains separately for each entity. | +| **Horizon** | A longer forecast horizon increases the number of generated predictions. | + + +If the chart is empty, as shown in the following image, check that your index contains at least one time series with more than 40 data points at the selected interval. + +test failed + + +### Reading the chart + +When the test succeeds, hover over any point on the chart to view exact values and confidence bounds: + +- **Actual data** – Solid line +- **Median prediction (P50)** – Dotted line +- **Confidence interval** – Shaded band between P10 and P90 + +The following image shows the chart view. + +Forecast chart with confidence bounds + +### Viewing forecasts from a specific date + +The forecast chart displays predictions starting from the final actual data point through the end of the configured horizon. + +For example, you might configure the following settings in the **Forecast from** field: + +- **Last actual timestamp**: Mar 5, 2025, 19:23 +- **Interval**: 1 minute +- **Horizon**: 24 + +With these settings, the forecast range would span `Mar 5, 2025, 19:23 – 19:47`, as shown in the following image. + +Forecast chart with trend + +You can also use the **Forecast from** dropdown list to view forecasts from earlier test runs, as shown in the following image. + +Forecast from dropdown + +When you select an earlier **Forecast from** time, the forecast line is drawn directly over the historical data available at that moment. This causes the two series to overlap, as shown in the following image. + +Overlapping forecast and actual data + +To return to the most recent forecast window, select **Show latest**. + +### Overlay mode: Side-by-side accuracy check + +By default the chart displays forecasts that start from a single origin point. Toggle Overlay mode to lay a forecast curve directly on top of the actual series and inspect accuracy across the entire timeline. + +Because the model emits one forecast per horizon step, for example, 24 forecasts when the horizon is 24, a single timestamp can have many forecasts that were generated from different origins. Overlay mode lets you decide which lead time (k) to plot: + +* Horizon index 0 = Immediate next step +* Horizon index 1 = 1 step ahead +* Horizon index 23 = 23 steps ahead + +The horizon control defaults to **index 3**, but you can choose any value to focus on a different lead time. + +The following image shows Overlay mode enabled with a horizon index of 3. The visualization plots the forecast curve (in purple) directly on top of the actual data points (shown with white-filled markers). This lets you evaluate the accuracy of the model's three-steps-ahead prediction across the full timeline. The forecast range is displayed as a shaded band around the predicted values, helping highlight uncertainty. + +overlay config + +### View multiple forecast series + +A high-cardinality forecaster can display many time series at once. Use the **Time series per page** dropdown menu in the results panel to switch between the following views: + +- **Single-series view** (default): Renders one entity per page for maximum readability. +- **Multi-series view**: Plots up to five entities side by side. Confidence bands are translucent by default—hover over a line to highlight its associated band. + +Actual and forecast lines are overlaid so you can assess accuracy point by point. However, in **Multi-series view**, the overlapping lines can make the chart more difficult to interpret. To reduce visual clutter, go to **Visualization options** and turn off **Show actual data at forecast**. + +The following image shows the chart with actual and forecast lines overlaid. + +Chart with actual and forecast lines overlaid + +The following image shows the same chart with actual lines hidden at forecast time to simplify the view. + +Chart with forecast lines only + + +### Exploring the timeline + +Use the following timeline controls to navigate, magnify, and filter any span of your forecast history: + +* **Zoom** – Select **+ / –** to zoom in on forecasts or broaden context. +* **Pan** – Use the arrow buttons to move to earlier or later data points, if any. +* **Quick Select** – Choose common ranges, such as "Last 24 hours", or supply custom dates for the result range. + +### Sorting options in multi-series view + +When a forecaster tracks more than five entities, the chart can't show every line at once. +In **Multi-series view**, you therefore choose the five most informative series and decide what "informative" means by selecting a sort method. The following table lists the available sort methods. + +| Sort method | What it shows | When to use it | +|-------------|--------------|----------------| +| **Minimum confidence-interval width** *(default)* | The five series whose prediction bands are narrowest. A narrow band indicates that the model is highly certain about its forecast. | Surface the most "trustworthy" forecasts. | +| **Maximum confidence-interval width** | The five series with the widest bands—forecasts the model is least sure about. | Spot risky or noisy series that may need review or more training data. | +| **Minimum value within the horizon** | The lowest predicted point across the forecast window for each entity, sorted in ascending order. | Identify entities expected to drop the farthest—useful for capacity planning or alerting on potential dips. | +| **Maximum value within the horizon** | The highest predicted point across the horizon for each entity, sorted in descending order. | Highlight series with the greatest expected peaks, such as traffic spikes or sales surges. | +| **Distance to threshold value** | Filters forecasts by a numeric threshold (>, <, ≥, ≤) and then orders the remainder by how far they sit from that threshold. | Investigate entities that breach—or nearly breach—an SLA or business KPI, such as "show anything forecast to exceed 10,000 requests". | + +If the forecaster monitors five or fewer entities, **Multi-series view** displays all of them. When there are more than five, the view reranks them dynamically each time you change the sort method or adjust the threshold, ensuring that the most relevant series stay in focus. + +To focus on a specific subset of entities, switch **Filter by** to **Custom query** and enter a query DSL query. The following example shows entities where the `host` equals `server_1`: + +```json +{ + "nested": { + "path": "entity", + "query": { + "bool": { + "must": [ + { "term": { "entity.name": "host" } }, + { "wildcard": { "entity.value": "server_1" } } + ] + } + } + } +} +``` + +Next, select a sort method, such as **Maximum value within the horizon**, and select **Update visualization**. The chart updates to show only the forecast series for `host:server_1`, ranked according to your selected criteria. + +### Edit a forecaster + +If the initial backtest shows weak performance, you can adjust the forecaster's configuration and run the test again. + +To edit a forecaster: + +1. Open the forecaster's **Details** page and select **Edit** to enter edit mode. +2. Modify the settings as needed—for example, add a **Category field**, change the **Interval**, or increase the **History** window. +3. Select **Update**. The validation panel automatically evaluates the new configuration and flags any issues. + + The following image shows the validation process in progress. + + Validation panel loading + +4. Resolve any validation errors. When the panel becomes green, select **Start test** in the upper-right corner to run another backtest with the updated parameters. + +### Real-time forecasting + +Once you are confident in the forecasting configuration, go to the **Details** page and click **Start forecasting** to begin real-time forecasting. The forecaster will generate new predictions at each interval moving forward. + +A **Live** badge appears when the chart is synchronized with the most recent data. + +Unlike backtesting, real-time forecasting continuously attempts to initialize using live data if there is not enough historical data. During this initialization period, the forecaster displays an initialization status until it has enough data to begin emitting forecasts. + +## Next steps + +Once you have tested and refined your forecaster, you can begin using it to generate live forecasts or manage it over time. To learn how to start, stop, delete, or update an existing forecaster, see [Managing forecasters]({{site.url}}{{site.baseurl}}/observing-your-data/forecast/managing-forecasters/). + diff --git a/_observing-your-data/forecast/index.md b/_observing-your-data/forecast/index.md new file mode 100644 index 00000000000..72315f1e158 --- /dev/null +++ b/_observing-your-data/forecast/index.md @@ -0,0 +1,29 @@ +--- +layout: default +title: Forecasting +nav_order: 81 +has_children: true +--- + +# Forecasting + +Forecasting in OpenSearch transforms any time-series field into a self-updating signal using the Random Cut Forest (RCF) model. RCF is an online learning model that updates incrementally with each new data point. Because RCF refreshes in real time, it adapts instantly to changes in technical conditions without requiring costly batch retraining. Each model uses only a small amount of storage—typically a few hundred kilobytes—so both compute and storage overhead remain low. + +Pair forecasting with the [Alerting plugin]({{site.url}}{{site.baseurl}}/monitoring-plugins/alerting/) to receive a notification the moment a forecasted value is predicted to breach your threshold. +{: .note} + +## Typical use case + +Forecasting can be used for the following use cases. + +| Domain | What you forecast | Operational benefit | +|--------|-------------------|---------------| +| Predictive maintenance | Future temperature, vibration, or error counts per machine | Replace parts before failure to avoid unplanned downtime. | +| Network forecasting | Future throughput, latency, or connection counts per node | Allocate bandwidth early to meet service-level agreement (SLA) targets. | +| Capacity and cost optimization | Future CPU, RAM, or disk usage per microservice | Rightsize hardware and autoscale efficiently. | +| Financial and operational planning | Future order volume, revenue, or ad spend efficiency | Align staffing and budgets with demand signals. | + + + + + diff --git a/_observing-your-data/forecast/managing-forecasters.md b/_observing-your-data/forecast/managing-forecasters.md new file mode 100644 index 00000000000..928e0d68d66 --- /dev/null +++ b/_observing-your-data/forecast/managing-forecasters.md @@ -0,0 +1,226 @@ +--- +layout: default +title: Managing forecasters +nav_order: 8 +parent: Forecasting +has_children: false +--- + +# Managing forecasters + +After you [create a forecaster]({{site.url}}{{site.baseurl}}/observing-your-data/forecast/getting-started/), you can manage its lifecycle and configuration using the **Details** page. This includes starting or stopping the forecaster, updating its settings, or deleting it entirely. Use this page to monitor forecaster status, troubleshoot issues, and fine-tune behavior over time. + +## Forecasters table + +The **Forecasters** table provides an overview of every forecaster you have configured. + +| Column | Description | +|--------|-------------| +| **Name** | The name you assigned when creating the forecaster. | +| **Status** | The current lifecycle state—for example, `Running`, `Initializing`, or `Test complete`. Click the icon for more information, including the timestamp of the most recent status change and any failure messages. | +| **Index** | The source index or alias from which the forecaster reads. | +| **Last updated** | The timestamp of the most recent configuration change. | +| **Quick actions** | Context-aware buttons such as **Start**, **Stop**, or **Delete**, depending on the forecaster's current state. | + +## Execution states + +A forecaster (that is, the underlying forecasting job) can be in any of the following states. Transitions marked *automatic* happen without user action; others require you to manually select **Start** or **Stop**. + +| State | Description | Typical trigger | +|-------|-------------|------------------| +| **Inactive** | The forecaster was created but never started. | None. | +| **Inactive: stopped** | The forecaster was manually stopped after running. | User selects **Stop forecasting**. | +| **Awaiting data to initialize forecast** | The job is trying to start but lacks enough historical data. | Automatic. | +| **Awaiting data to restart forecast** | The job is resuming after a data gap and is waiting for new data. | Automatic after a data outage. | +| **Initializing test** | The model is being built for a one-time backtest. | Automatic on **Create and test** or **Start test**. | +| **Test complete** | The backtest has finished and the job is no longer running. | Automatic. | +| **Initializing forecast** | The model is being trained for continuous real-time forecasting. | Automatic after selecting **Start forecasting**. | +| **Running** | The job is streaming live data and generating forecasts. | Automatic when initialization completes successfully. | +| **Initializing test failed** | The test failed, often due to insufficient data. | Automatic. | +| **Initializing forecast failed** | Real-time mode failed to initialize. | Automatic. | +| **Forecast failed** | The job started but encountered a runtime error, such as a shard failure. | Automatic but requires the user's attention. | + +The following diagram illustrates the relationships and transitions between states. + +Forecast state diagram + +## Find and filter forecasters + +If you have many forecasters, use the pagination controls at the bottom of the table to navigate between pages. You can also use the search bar to filter by **name**, **status**, or **index**, which can be helpful when managing large sets of forecasters. + +## Alert on forecasted values + +Because forecast result indexes are not system indexes, you can create an [Alerting monitor]({{site.url}}{{site.baseurl}}/monitoring-plugins/alerting/) for the result indexes like you would for any other user index. + +### Example alert monitor + +For example, the following is a monitor for a high-cardinality forecaster. You can modify the schedule, query, and aggregation to match your use case: + +{% raw %} +```json +{ + "name": "test", + "type": "monitor", + "monitor_type": "query_level_monitor", + "enabled": true, + "schedule": { + "period": { + "unit": "MINUTES", + "interval": 1 + } + }, + "inputs": [ + { + "search": { + "indices": [ + "opensearch-forecast-results*" + ], + "query": { + "size": 1, + "query": { + "bool": { + "filter": [ + { + "range": { + "execution_end_time": { + "from": "{{period_end}}||-15m", + "to": "{{period_end}}", + "include_lower": true, + "include_upper": true, + "format": "epoch_millis", + "boost": 1 + } + } + } + ], + "adjust_pure_negative": true, + "boost": 1 + } + }, + "aggregations": { + "metric": { + "max": { + "field": "forecast_upper_bound" + } + } + } + } + } + } + ], + "triggers": [ + { + "query_level_trigger": { + "id": "29oAl5cB5QuI4WJQ3hnx", + "name": "breach", + "severity": "1", + "condition": { + "script": { + "source": "return ctx.results[0].aggregations.metric.value == null ? false : ctx.results[0].aggregations.metric.value > 10000", + "lang": "painless" + } + }, + "actions": [ + { + "id": "notification378084", + "name": "email", + "destination_id": "2uzIlpcBMf-0-aT5HOtn", + "message_template": { + "source": "Monitor **{{ctx.monitor.name}}** entered **ALERT** state — please investigate.\n\nTrigger : {{ctx.trigger.name}}\nSeverity : {{ctx.trigger.severity}}\nTime range : {{ctx.periodStart}} → {{ctx.periodEnd}} UTC\n\nEntity\n{{#ctx.results.0.hits.hits.0._source.entity}}\n • {{name}} = {{value}}\n{{/ctx.results.0.hits.hits.0._source.entity}}\n", + "lang": "mustache" + }, + "throttle_enabled": true, + "subject_template": { + "source": "Alerting Notification action", + "lang": "mustache" + }, + "throttle": { + "value": 15, + "unit": "MINUTES" + } + } + ] + } + } + ], + "ui_metadata": { + "schedule": { + "timezone": null, + "frequency": "interval", + "period": { + "unit": "MINUTES", + "interval": 1 + }, + "daily": 0, + "weekly": { + "tue": false, + "wed": false, + "thur": false, + "sat": false, + "fri": false, + "mon": false, + "sun": false + }, + "monthly": { + "type": "day", + "day": 1 + }, + "cronExpression": "0 */1 * * *" + }, + "monitor_type": "query_level_monitor", + "search": { + "searchType": "query", + "timeField": "execution_end_time", + "aggregations": [ + { + "aggregationType": "max", + "fieldName": "forecast_upper_bound" + } + ], + "groupBy": [], + "bucketValue": 15, + "bucketUnitOfTime": "m", + "filters": [] + } + } +} +``` +{% endraw %} +{% include copy-curl.html %} + +### Monitor design + +The following table explains each design choice used in the example alert monitor and why it matters. + +| Design choice | Rationale | +|---------------|-----------| +| `size: 1` in the search input | Retrieves a single document so you can reference `ctx.results.0.hits.hits.0` in the notification to identify which entity (such as `host` or `service`) triggered the alert. | +| `execution_end_time` range `"now-15m"` → `now` | Filters on the result creation timestamp, which reflects when the forecast was generated. This avoids delays caused by ingestion lag. Avoid filtering on `data_end_time` if your index includes late-arriving data (such as backfilled logs). | +| `max(forecast_upper_bound)` as the metric | Detects upper-bound spikes. Alternatives include:
`min(forecast_lower_bound)` for sudden drops.
`avg(forecast_value)` for trend shifts.
For additional fields, see the [forecast result schema](https://github.com/opensearch-project/anomaly-detection/blob/main/src/main/resources/mappings/forecast-results.json). | +| Index pattern `opensearch-forecast-results*` | Matches the default result index pattern. Update this pattern if you route results to a custom index, such as `opensearch-forecast-result-abc*`. | +| Optional term filter on `forecaster_id` | Use this filter to target a specific forecaster and avoid matching unrelated forecasts. | +| Monitor every 1 min, query window 15 min | Evaluates forecasts every minute to detect anomalies quickly. The 15-minute lookback increases resilience to timing delays. Combined with a 15-minute alert throttle, this avoids duplicate notifications for the same event. | +| Mustache block prints all entity dimensions | Displays both single-dimension (`host=server_3`) and multi-dimension (`host=server_3`, `service=auth`) entity values. You can also include a link to a pre-filtered dashboard for faster triage. | +| Threshold | Use the OpenSearch Dashboards visual editor to analyze recent forecast values and determine an appropriate threshold that reliably indicates anomalies. | + + +### Example alert + +The following example shows a sample alert email generated by a monitor that detects when a forecasted value breaches a defined threshold. In this case, the monitor is tracking a high-cardinality forecaster and has triggered an alert for a specific entity (`host = server_3`): + +``` +Monitor **test** entered **ALERT** state — please investigate. + +Trigger : breach +Severity : 1 +Time range : 2025-06-22T09:56:14.490Z → 2025-06-22T09:57:14.490Z UTC + +Entity + • host = server_3 +``` + +## Next steps + +After setting up and managing your forecasters, you may want to control who can access and modify them. To learn how to manage permissions, secure result indexes, and apply fine-grained access controls, see [the security page]({{site.url}}{{site.baseurl}}/observing-your-data/forecast/security/). + + diff --git a/_observing-your-data/forecast/security.md b/_observing-your-data/forecast/security.md new file mode 100644 index 00000000000..4bab11d82c4 --- /dev/null +++ b/_observing-your-data/forecast/security.md @@ -0,0 +1,468 @@ +--- +layout: default +title: Forecasting security +nav_order: 10 +parent: Forecasting +has_children: false +--- + +# Forecasting security + +Forecasting uses the same security framework as anomaly detection. This page explains how to configure permissions for users to create, run, and view forecasters; how to restrict access to system indexes; and how to isolate forecast results across teams. + +In all examples, replace credentials, index names, and role names with values appropriate for your environment. +{: .note} + +## Indexes created by forecasting + +The following table describes the indexes used by the Forecasting API and their visibility to regular users. + +| Index pattern | Purpose | Visible to regular users? | +|---------------|---------|---------------------------| +| `.opensearch-forecasters` | Stores forecaster configuration. | No | +| `.opensearch-forecast-checkpoints` | Stores model snapshots (checkpoints). | No | +| `.opensearch-forecast-state` | Stores task metadata for real-time and run-once forecasting. | No | +| `opensearch-forecast-result*` | Stores forecast results from both backtests and real-time forecasting. | Yes | + +Users do not need direct access to `.opensearch-forecast-checkpoints`; it is used internally by the plugin. + +To view `.opensearch-forecasters`, use the [Get forecaster]({{site.url}}{{site.baseurl}}/observing-your-data/forecast/api/#get-forecaster) or [Search forecasters]({{site.url}}{{site.baseurl}}/observing-your-data/forecast/api/#search-forecasters) APIs. + +To view `.opensearch-forecast-state`, use the [Get forecaster]({{site.url}}{{site.baseurl}}/observing-your-data/forecast/api/#get-forecaster) API with the `?task=true` query parameter or call the [Search tasks]({{site.url}}{{site.baseurl}}/observing-your-data/forecast/api/#search-tasks) API directly. + + +## Cluster permissions + +Each Forecasting API route maps to a specific cluster-level permission, as shown in the following table. You must grant these permissions to roles that manage or interact with forecasters. + +| Route | Required permission | +|:------------|:---------------------| +| `POST /_plugins/_forecast/forecasters` | `cluster:admin/plugin/forecast/forecaster/write` | +| `PUT /_plugins/_forecast/forecasters/{id}` | `cluster:admin/plugin/forecast/forecaster/write` | +| `POST /_plugins/_forecast/forecasters/_validate` | `cluster:admin/plugin/forecast/forecaster/validate` | +| `POST /_plugins/_forecast/forecasters/_suggest/{types}` | `cluster:admin/plugin/forecast/forecaster/suggest` | +| `GET /_plugins/_forecast/forecasters/{id}`
`GET /_plugins/_forecast/forecasters/{id}?task=true` | `cluster:admin/plugin/forecast/forecaster/get` | +| `DELETE /_plugins/_forecast/forecasters/{id}` | `cluster:admin/plugin/forecast/forecaster/delete` | +| `POST /_plugins/_forecast/forecasters/{id}/_start`
`POST /_plugins/_forecast/forecasters/{id}/_stop` | `cluster:admin/plugin/forecast/forecaster/jobmanagement` | +| `POST /_plugins/_forecast/forecasters/{id}/_run_once` | `cluster:admin/plugin/forecast/forecaster/runOnce` | +| `POST /_plugins/_forecast/forecasters/_search`
`GET /_plugins/_forecast/forecasters/_search` | `cluster:admin/plugin/forecast/forecaster/search` | +| `GET /_plugins/_forecast/forecasters/tasks/_search` | `cluster:admin/plugin/forecast/tasks/search` | +| `POST /_plugins/_forecast/forecasters/{id}/results/_topForecasts` | `cluster:admin/plugin/forecast/result/topForecasts` | +| `GET /_plugins/_forecast/forecasters/{id}/_profile` | `cluster:admin/plugin/forecast/forecasters/profile` | +| `GET /_plugins/_forecast/stats` | `cluster:admin/plugin/forecast/forecaster/stats` | +| `GET /_plugins/_forecast/forecasters/count`
`GET /_plugins/_forecast/forecasters/match` | `cluster:admin/plugin/forecast/forecaster/info` | + +## Required roles + +A forecasting user needs three types of privileges, based on the following responsibilities: + +- Managing the forecasting job +- Reading the source data +- Accessing the forecast results + +These responsibilities correspond to three distinct security layers, as shown in the following table. + +| Layer | What it controls | Typical role | +|-------|------------------|--------------| +| **Forecaster control** | Permissions to create, edit, start, stop, delete, or view a forecaster's configuration. | `forecast_full_access`
(manage lifecycle)
or
`forecast_read_access`
(view only) | +| **Data-source read** | Grants the forecaster permission to query the raw metrics index it uses for training and prediction. | Custom role, such as `data_source_read` | +| **Result read** | Grants users and Alerting monitors access to documents in `opensearch-forecast-result*`. | Custom role, such as `forecast_result_read` | + + +The built-in roles `forecast_full_access` and `forecast_read_access` apply only to Forecasting APIs. They do **not** include permissions for source or result indexes—those must be granted separately. +{: .note} + + +### Forecaster control roles + +The Forecasting API includes two built-in roles that you can use as is or use as templates for creating custom roles: + +- `forecast_read_access` – For analysts who need read-only access to forecasters. This role allows users to view forecaster details and results but not create, modify, start, stop, or delete forecasters. + + +- `forecast_full_access` – For users responsible for managing the full lifecycle of forecasters, including creating, editing, starting, stopping, and deleting them. This role does **not** grant access to the source index. To create a forecaster, users must also have index-level permissions that include the `search` action on any index or alias the forecaster reads from. + +The following example shows how these roles are defined: + +```yaml +forecast_read_access: + reserved: true + cluster_permissions: + - 'cluster:admin/plugin/forecast/forecaster/info' + - 'cluster:admin/plugin/forecast/forecaster/stats' + - 'cluster:admin/plugin/forecast/forecaster/suggest' + - 'cluster:admin/plugin/forecast/forecaster/validate' + - 'cluster:admin/plugin/forecast/forecasters/get' + - 'cluster:admin/plugin/forecast/forecasters/info' + - 'cluster:admin/plugin/forecast/forecasters/search' + - 'cluster:admin/plugin/forecast/result/topForecasts' + - 'cluster:admin/plugin/forecast/tasks/search' + index_permissions: + - index_patterns: + - 'opensearch-forecast-result*' + allowed_actions: + - 'indices:admin/mappings/fields/get*' + - 'indices:admin/resolve/index' + - 'indices:data/read*' + +forecast_full_access: + reserved: true + cluster_permissions: + - 'cluster:admin/plugin/forecast/*' + - 'cluster:admin/settings/update' + - 'cluster_monitor' + index_permissions: + - index_patterns: + - '*' + allowed_actions: + - 'indices:admin/aliases/get' + - 'indices:admin/mapping/get' + - 'indices:admin/mapping/put' + - 'indices:admin/mappings/fields/get*' + - 'indices:admin/mappings/get' + - 'indices:admin/resolve/index' + - 'indices:data/read*' + - 'indices:data/read/field_caps*' + - 'indices:data/read/search' + - 'indices:data/write*' + - 'indices_monitor' +``` +{% include copy.html %} + +These roles do not include default `index_permissions` for specific source or result indexes. This is intentional, allowing you to add your own patterns based on your data access requirements. + +### Data source `read` role + +Each forecaster uses the creating user's credentials to query the source index. To enable this, you must grant that user read permissions for your own data index. + +The following example request creates a minimal role that allows read access to the `network-metrics` index: + +```json +PUT _plugins/_security/api/roles/data_source_read +{ + "index_permissions": [{ + "index_patterns": ["network-metrics"], + "allowed_actions": ["read"] + }] +} +``` +{% include copy-curl.html %} + +You can modify the `index_patterns` to match your actual data source. + +### `Result‑read` role + +The `forecast_result_read` role allows users to view forecast results and configure Alerting monitors that query those results. + +The following example request defines a role that grants read access to all indexes matching the `opensearch-forecast-result*` pattern: + +```json +PUT _plugins/_security/api/roles/forecast_result_read +{ + "index_permissions": [{ + "index_patterns": ["opensearch-forecast-result*"], + "allowed_actions": ["read"] + }] +} +``` +{% include copy-curl.html %} + +If you need to isolate result data between teams, you can enhance this role using document-level security (DLS) with a backend role filter, as shown in the following section. + +### Example security role configuration + +The following example request creates a `devOpsEngineer` user and assigns all three required roles for forecasting: + +```json +PUT _plugins/_security/api/internalusers/devOpsEngineer +{ + "password": "DevOps2024!", + "opendistro_security_roles": [ + "forecast_full_access", + "data_source_read", + "forecast_result_read" + ] +} +``` +{% include copy-curl.html %} + +This configuration enables the following: + +- `devOpsEngineer` can manage forecasters (`forecast_full_access`). +- Forecasters can query the source index successfully (`data_source_read`). +- The user and any configured monitors can read forecast results (`forecast_result_read`). + +To grant read-only access to forecaster configurations, replace `forecast_full_access` with `forecast_read_access`. + +--- + +## (Advanced) Limit access by backend role + +You can use backend roles to enforce **team-specific isolation**. This pattern allows different teams to operate forecasters independently while separating configurations and results. + +The model includes three layers: + +1. **Configuration isolation** – Forecasting APIs are restricted to users with a matching backend role. +2. **Result isolation** – DLS limits access to forecast results in `opensearch-forecast-result*`. +3. **Source data access** – A minimal read-only role enables each forecaster to scan its own index. + +The following sections explain how to configure each layer. + +### Assign backend roles to users + +In most environments, backend roles are assigned through LDAP or SAML. However, if you are using the internal user database, you can set them manually, as shown in the following example: + +```json +# Analyst +PUT _plugins/_security/api/internalusers/alice +{ + "password": "alice", + "backend_roles": ["analyst"] +} + +# HR staff +PUT _plugins/_security/api/internalusers/bob +{ + "password": "bob", + "backend_roles": ["human-resources"] +} +``` + +These backend roles can then be used to control access to forecasters and forecast results on a per-team basis. + +### Enable backend-role filtering for configuration access + +To isolate forecaster configurations by team, enable backend-role filtering at the cluster level: + + +```bash +PUT _cluster/settings +{ + "persistent": { + "plugins.forecast.filter_by_backend_roles": true + } +} +``` +{% include copy-curl.html %} + +When this setting is enabled, OpenSearch records the creator's backend roles in each forecaster document. Only users with a matching backend role can view, edit, or delete that forecaster. + +### Create a `result‑access` role per team + +Forecast results are stored in shared indexes, so use DLS to restrict access by backend role. + +The following example request creates a role that allows users with the `analyst` backend role to read and to write only their team's forecast results: + + +```json +PUT _plugins/_security/api/roles/forecast_analyst_result_access +{ + "index_permissions": [{ + "index_patterns": ["opensearch-forecast-result*"], + "dls": """ + { + "bool": { + "filter": [{ + "nested": { + "path": "user", + "query": { + "term": { + "user.backend_roles.keyword": "analyst" + } + }, + "score_mode": "none" + } + }] + } + }""", + "allowed_actions": ["read","write"] + }] +} +``` +{% include copy-curl.html %} + +To isolate results for another team, such as `human-resources`, create a separate role (for example, `forecast_human_resources_result_access`) and update the term value to match the appropriate backend role. + +### Define `data-source` read access + +The `data_source_read` role is defined in the same way as in earlier examples. It grants minimal read access to the metrics index that each forecaster uses for training and prediction. + +You can reuse this role across teams or create separate versions if you need per-index restrictions. + +### Map a user to three roles + +The following example maps the user `alice` to all three required roles—`full_access`, `result_access`, and `data_source_read`—using the `analyst` backend role: + +```json +PUT _plugins/_security/api/internalusers/alice +{ + "password": "alice", + "backend_roles": ["analyst"], + "opendistro_security_roles": [ + "forecast_full_access", + "forecast_analyst_result_access", + "data_source_read" + ] +} +``` +{% include copy-curl.html %} + +With this configuration, Alice can: + +- Create, start, stop, and delete only forecasters tagged with the `analyst` backend role. +- View only forecast results tagged with the `analyst` backend role. +- Read the `network-metrics` index as the source for her forecasters. + +To configure a second user, such as `bob` from the HR team, use a parallel setup with the `human-resources` backend role and `forecast_human_resources_result_access`. + +### Users without backend roles + +If a user has the `forecast_read_access` role but no backend roles, they cannot view any forecasters. Backend-role filtering enforces strict matching and prevents access to configurations that do not align with the user's roles. + +--- + +## Selecting remote indexes with fine-grained access control + +To use a remote index as a data source for a forecaster, follow the steps outlined in the [Authentication flow]({{site.url}}{{site.baseurl}}/search-plugins/cross-cluster-search/#authentication-flow) section of the [Cross-cluster search]({{site.url}}{{site.baseurl}}/search-plugins/cross-cluster-search/) documentation. + +To succeed, the user must: + +- Use a security role that exists in both the local and remote clusters. +- Have that role mapped to the same username in both clusters. + +### Example: Create a new user in the local cluster + +Using the following command, create a new user in the local cluster who can create the forecaster: + +```bash +curl -XPUT -k -u 'admin:' \ + 'https://localhost:9200/_plugins/_security/api/internalusers/forecastuser' \ + -H 'Content-Type: application/json' \ + -d '{"password":"password"}' +``` +{% include copy-curl.html %} + +Using the following command, map the new user to the `forecast_full_access` role: + +``` +curl -XPUT -k -u 'admin:' \ + 'https://localhost:9200/_plugins/_security/api/rolesmapping/forecast_full_access' \ + -H 'Content-Type: application/json' \ + -d '{"users":["forecastuser"]}' +``` +{% include copy-curl.html %} + +In the remote cluster, create the same user and map `forecast_full_access` to that role, as shown in the following command: + +```bash +# Create the user +curl -XPUT -k -u 'admin:' \ + 'https://localhost:9250/_plugins/_security/api/internalusers/forecastuser' \ + -H 'Content-Type: application/json' \ + -d '{"password":"password"}' + +# Map the role +curl -XPUT -k -u 'admin:' \ + 'https://localhost:9250/_plugins/_security/api/rolesmapping/forecast_full_access' \ + -H 'Content-Type: application/json' \ + -d '{"users":["forecastuser"]}' +``` +{% include copy-curl.html %} + +### Grant source index read access in both clusters + +To create a forecaster, the user also needs index-level permissions for the `search` or `read` [action groups]({{site.url}}{{site.baseurl}}/security/access-control/default-action-groups/) on every source index, alias, or pattern that the forecaster reads. The permission check occurs in both clusters when reading a remote index. Define and map the same role in both locations. + + +In the local cluster, define a `read` role that grants access to the source index and map it to the forecasting user, as shown in the following command: + +```bash +# Create a role that can search the data +curl -XPUT -k -u 'admin:' \ + 'https://localhost:9200/_plugins/_security/api/roles/data_source_read' \ + -H 'Content-Type: application/json' \ + -d '{ + "index_permissions":[{ + "index_patterns":["network-requests"], + "allowed_actions":["search"] + }] + }' + +# Map the role to forecastuser +curl -XPUT -k -u 'admin:' \ + 'https://localhost:9200/_plugins/_security/api/rolesmapping/data_source_read' \ + -H 'Content-Type: application/json' \ + -d '{"users":["forecastuser"]}' +``` +{% include copy-curl.html %} + +In the remote cluster, define the same role and map it to the same user to ensure that permissions are mirrored across clusters, as shown in the following command: + +``` +# Create the identical role +curl -XPUT -k -u 'admin:' \ + 'https://localhost:9250/_plugins/_security/api/roles/data_source_read' \ + -H 'Content-Type: application/json' \ + -d '{ + "index_permissions":[{ + "index_patterns":["network-requests"], + "allowed_actions":["search"] + }] + }' + +# Map the role to the same user +curl -XPUT -k -u 'admin:' \ + 'https://localhost:9250/_plugins/_security/api/rolesmapping/data_source_read' \ + -H 'Content-Type: application/json' \ + -d '{"users":["forecastuser"]}' +``` +{% include copy-curl.html %} + + +### Register the remote cluster with the local cluster + +Register the remote cluster with the local cluster using a seed node under the `cluster.remote..seeds` setting. In OpenSearch, this is called adding a `follower` cluster. + +Assuming that the remote cluster is listening on transport port `9350`, run the following command in the local cluster: + +``` +curl -X PUT "https://localhost:9200/_cluster/settings" \ + -H "Content-Type: application/json" \ + -u "admin:" \ + -d '{ + "persistent": { + "cluster.remote": { + "follower": { + "seeds": [ "127.0.0.1:9350" ] + } + } + } + }' +``` +{% include copy-curl.html %} + + +- Replace `127.0.0.1` with the remote node's transport layer IP if it's located on a different host. +- The alias `follower` can be any name you choose and will be used when referencing remote indexes or configuring cross-cluster replication. +{: .note} + +--- + +## Custom result index permissions + +You can specify a custom index for forecast results instead of using the default result index. If the custom index does not already exist, it will be created automatically when you create a forecaster and start a real-time analysis or test run. + +If the custom index already exists, the Forecasting API checks that the index mapping matches the expected forecast result structure. To ensure compatibility, the index must conform to the schema defined in the [`forecast-results.json`](https://github.com/opensearch-project/anomaly-detection/blob/main/src/main/resources/mappings/forecast-results.json) file. + +When a user creates a forecaster—either in OpenSearch Dashboards or by calling the Forecasting API—the system verifies that the user has the following index-level permissions for the custom index: + +- `indices:admin/create` – Required to create and roll over the custom result index. +- `indices:admin/aliases` – Required to create and manage the index alias. +- `indices:data/write/index` – Required to write forecast results to the index (single-stream forecasters). +- `indices:data/read/search` – Required to search the custom index when displaying forecast results. +- `indices:data/write/delete` – Required to delete older forecast results and manage disk usage. +- `indices:data/write/bulk*` – Required because the plugin writes results using the Bulk API. + +## Next step + +For more information about TLS, authentication backends, tenant isolation, and audit logging, see the [Security plugin documentation]({{site.url}}{{site.baseurl}}/security/). diff --git a/images/forecast/bound.png b/images/forecast/bound.png new file mode 100644 index 00000000000..5adc4408a58 Binary files /dev/null and b/images/forecast/bound.png differ diff --git a/images/forecast/forecast_from_1.png b/images/forecast/forecast_from_1.png new file mode 100644 index 00000000000..d0cf705ec91 Binary files /dev/null and b/images/forecast/forecast_from_1.png differ diff --git a/images/forecast/forecast_from_2.png b/images/forecast/forecast_from_2.png new file mode 100644 index 00000000000..0fb336e2f86 Binary files /dev/null and b/images/forecast/forecast_from_2.png differ diff --git a/images/forecast/no_rcf_calibration.png b/images/forecast/no_rcf_calibration.png new file mode 100644 index 00000000000..da2d604ae27 Binary files /dev/null and b/images/forecast/no_rcf_calibration.png differ diff --git a/images/forecast/no_result.png b/images/forecast/no_result.png new file mode 100644 index 00000000000..a3d2604cb9b Binary files /dev/null and b/images/forecast/no_result.png differ diff --git a/images/forecast/overlay_3.png b/images/forecast/overlay_3.png new file mode 100644 index 00000000000..929b93e0f50 Binary files /dev/null and b/images/forecast/overlay_3.png differ diff --git a/images/forecast/state.png b/images/forecast/state.png new file mode 100644 index 00000000000..2722abbf924 Binary files /dev/null and b/images/forecast/state.png differ diff --git a/images/forecast/toggle_overlay_after.png b/images/forecast/toggle_overlay_after.png new file mode 100644 index 00000000000..129d0eb9e6d Binary files /dev/null and b/images/forecast/toggle_overlay_after.png differ diff --git a/images/forecast/toggle_overlay_before.png b/images/forecast/toggle_overlay_before.png new file mode 100644 index 00000000000..5f1cad0dbd8 Binary files /dev/null and b/images/forecast/toggle_overlay_before.png differ diff --git a/images/forecast/trend.png b/images/forecast/trend.png new file mode 100644 index 00000000000..7c03e52302c Binary files /dev/null and b/images/forecast/trend.png differ diff --git a/images/forecast/validation_loading.png b/images/forecast/validation_loading.png new file mode 100644 index 00000000000..2d98bc85b77 Binary files /dev/null and b/images/forecast/validation_loading.png differ