Merge pull request #111817 from v-dihans/stream-analytics-troubleshoot-output

ShawnJackson · web-flow · commit c77c8206bd5a · 2020-04-24T16:58:40.000-05:00
edit pass: stream-analytics-troubleshoot-output
diff --git a/articles/stream-analytics/stream-analytics-troubleshoot-output.md b/articles/stream-analytics/stream-analytics-troubleshoot-output.md
@@ -12,75 +12,75 @@ ms.custom: seodec18
 
 # Troubleshoot Azure Stream Analytics outputs
 
-This article describes common issues with Azure Stream Analytics output connections, how to troubleshoot output issues, and how to correct the issues. Many troubleshooting steps require resource and other diagnostic logs to be enabled for your Stream Analytics job. If you do not have resource logs enabled, see [Troubleshoot Azure Stream Analytics by using resource logs](stream-analytics-job-diagnostic-logs.md).
+This article describes common issues with Azure Stream Analytics output connections and how to troubleshoot output issues. Many troubleshooting steps require resource and other diagnostic logs be enabled for your Stream Analytics job. If you don't have resource logs enabled, see [Troubleshoot Azure Stream Analytics by using resource logs](stream-analytics-job-diagnostic-logs.md).
 
-## Output not produced by job
+## The job doesn't produce output
 
-1.  Verify connectivity to outputs by using the **Test Connection** button for each output.
+1. Verify connectivity to outputs by using the **Test Connection** button for each output.
+1. Look at [Monitoring metrics](stream-analytics-monitoring.md) on the **Monitor** tab. Because the values are aggregated, the metrics are delayed by a few minutes.
 
-2.  Look at [**Monitoring Metrics**](stream-analytics-monitoring.md) on the **Monitor** tab. Because the values are aggregated, the metrics are delayed by a few minutes.
-   * If Input Events are greater than 0, the job is able to read input data. If Input Events are not greater than 0, then there is an issue with the job's input. See [Troubleshoot input connections](stream-analytics-troubleshoot-input.md) to learn how to troubleshoot input connection issues.
-   * If Data Conversion Errors are greater than 0 and climbing, see [Azure Stream Analytics data errors](data-errors.md) for detailed information about data conversion errors.
-   * If Runtime Errors are greater than 0, your job can receive data but it's generating errors while processing the query. To find the errors, go to the [Audit Logs](../azure-resource-manager/management/view-activity-logs.md) and filter on *Failed* status.
-   * If InputEvents is greater than 0 and OutputEvents equals 0, one of the following is true:
-      * Query processing resulted in zero output events.
-      * Events or fields might be malformed, resulting in zero output after query processing.
+   * If the **Input Events** value is greater than zero, the job can read the input data. If the **Input Events** value isn't greater than zero, there's an issue with the job's input. See [Troubleshoot input connections](stream-analytics-troubleshoot-input.md) for more information.
+   * If the **Data Conversion Errors** value is greater than zero and climbing, see [Azure Stream Analytics data errors](data-errors.md) for detailed information about data conversion errors.
+   * If the **Runtime Errors** value is greater than zero, your job receives data but generates errors while processing the query. To find the errors, go to the [audit logs](../azure-resource-manager/management/view-activity-logs.md), and then filter on the **Failed** status.
+   * If the **Input Events** value is greater than zero and the **Output Events** value equals zero, one of the following statements is true:
+      * The query processing resulted in zero output events.
+      * Events or fields might be malformed, resulting in a zero output after the query processing.
       * The job was unable to push data to the output sink for connectivity or authentication reasons.
 
-   In all the previously mentioned error cases, operations log messages explain additional details (including what is happening), except in cases where the query logic filtered out all events. If the processing of multiple events generates errors, the errors are aggregated every 10 minutes.
+   Operations log messages explain additional details, including what's happening, except in cases where the query logic filters out all events. If the processing of multiple events generates errors, the errors aggregate every 10 minutes.
 
-## Job output is delayed
+## The first output is delayed
 
-### First output is delayed
+When a Stream Analytics job starts, the input events are read. But, there can be a delay in the output, in certain circumstances.
 
-When a Stream Analytics job is started, the input events are read, but there can be a delay in the output being produced in certain circumstances.
+Large time values in temporal query elements can contribute to the output delay. To produce the correct output over large time windows, the streaming job reads data from the latest time possible to fill the time window. The data can be up to seven days past. No output produces until the outstanding input events are read. This problem can surface when the system upgrades the streaming jobs. When an upgrade takes place, the job restarts. Such upgrades generally occur once every couple of months.
 
-Large time values in temporal query elements can contribute to the output delay. To produce correct output over the large time windows, the streaming job starts up by reading data from the latest time possible (up to seven days ago) to fill the time window. During that time, no output is produced until the catch-up read of the outstanding input events is complete. This problem can surface when the system upgrades the streaming jobs, thus restarting the job. Such upgrades generally occur once every couple of months.
+Use discretion when designing your Stream Analytics query. If you use a large time window for temporal elements in the job's query syntax, it can lead to a delay in the first output when the job starts or restarts. More than several hours, up to seven days, is considered a large time window.
 
-Therefore, use discretion when designing your Stream Analytics query. If you use a large time window (more than several hours, up to seven days) for temporal elements in the job's query syntax, it can lead to a delay on the first output when the job is started or restarted.  
+One mitigation for this kind of first output delay is to use query parallelization techniques, such as partitioning the data. Or, you can add more Streaming Units to improve the throughput until the job catches up.  For more information, see [Considerations when creating Stream Analytics jobs](stream-analytics-concepts-checkpoint-replay.md).
 
-One mitigation for this kind of first output delay is to use query parallelization techniques (partitioning the data), or add more Streaming Units to improve the throughput until the job catches up.  For more information, see [Considerations when creating Stream Analytics jobs](stream-analytics-concepts-checkpoint-replay.md)
+These factors affect the timeliness of the first output:
 
-These factors impact the timeliness of the first output that is generated:
+* The use of windowed aggregates, such as a GROUP BY clause of tumbling, hopping, and sliding windows:
 
-1. Use of windowed aggregates (GROUP BY of Tumbling, Hopping, and Sliding windows)
-   - For tumbling or hopping window aggregates, results are generated at the end of the window timeframe.
-   - For a sliding window, the results are generated when an event enters or exits the sliding window.
-   - If you are planning to use large window size (> 1 hour), it’s best to choose hopping or sliding window so that you can see the output more frequently.
+  * For tumbling or hopping window aggregates, the results generate at the end of the window timeframe.
+  * For a sliding window, the results generate when an event enters or exits the sliding window.
+  * If you're planning to use a large window size, such as more than one hour, it's best to choose a hopping or sliding window. These window types let you see the output more frequently.
 
-2. Use of temporal joins (JOIN with DATEDIFF)
-   - Matches are generated as soon as when both sides of the matched events arrive.
-   - Data that lacks a match (LEFT OUTER JOIN) is generated at the end of the DATEDIFF window with respect to each event on the left side.
+* The use of temporal joins, such as JOIN with DATEDIFF:
+  * Matches generate as soon as both sides of the matched events arrive.
+  * Data that lacks a match, like LEFT OUTER JOIN, is generated at the end of the DATEDIFF window, for each event on the left side.
 
-3. Use of temporal analytic functions (ISFIRST, LAST, and LAG with LIMIT DURATION)
-   - For analytic functions, the output is generated for every event, there is no delay.
+* The use of temporal analytic functions, such as ISFIRST, LAST, and LAG with LIMIT DURATION:
+  * For analytic functions, the output is generated for every event. There is no delay.
 
-### Output falls behind
+## The output falls behind
 
-During normal operation of the job, if you find the job’s output is falling behind (longer and longer latency), you can pinpoint the root causes by examining these factors:
-- Whether the downstream sink is throttled
-- Whether the upstream source is throttled
-- Whether the processing logic in the query is compute-intensive
+During the normal operation of a job, the output might have longer and longer periods of latency. If the output falls behind like that, you can pinpoint the root causes by examining the following factors:
 
-To see those details, in the Azure portal, select the streaming job, and select the **Job diagram**. For each input, there is a per partition backlog event metric. If the backlog event metric keeps increasing, it’s an indicator that the system resources are constrained. Potentially that is due to of output sink throttling, or high CPU. For more information on using the job diagram, see [Data-driven debugging by using the job diagram](stream-analytics-job-diagram-with-metrics.md).
+* Whether the downstream sink is throttled
+* Whether the upstream source is throttled
+* Whether the processing logic in the query is compute-intensive
+
+To see the output details, select the streaming job in the Azure portal, and then select **Job diagram**. For each input, there's a backlog event metric per partition. If the metric keeps increasing, it's an indicator that the system resources are constrained. The increase is potentially due to output sink throttling, or high CPU usage. For more information, see [Data-driven debugging by using the job diagram](stream-analytics-job-diagram-with-metrics.md).
 
 ## Key violation warning with Azure SQL Database output
 
-When you configure Azure SQL database as output to a Stream Analytics job, it bulk inserts records into the destination table. In general, Azure stream analytics guarantees [at least once delivery](https://docs.microsoft.com/stream-analytics-query/event-delivery-guarantees-azure-stream-analytics) to the output sink, one can still [achieve exactly-once delivery]( https://blogs.msdn.microsoft.com/streamanalytics/2017/01/13/how-to-achieve-exactly-once-delivery-for-sql-output/) to SQL output when SQL table has a unique constraint defined.
+When you configure an Azure SQL database as output to a Stream Analytics job, it bulk inserts records into the destination table. In general, Azure Stream Analytics guarantees [at-least-once delivery](https://docs.microsoft.com/stream-analytics-query/event-delivery-guarantees-azure-stream-analytics) to the output sink. You can still [achieve exactly-once delivery]( https://blogs.msdn.microsoft.com/streamanalytics/2017/01/13/how-to-achieve-exactly-once-delivery-for-sql-output/) to a SQL output when a SQL table has a unique constraint defined.
 
-Once unique key constraints are set up on the SQL table, and there are duplicate records being inserted into SQL table, Azure Stream Analytics removes the duplicate record. It splits the data into batches and recursively inserting the batches until a single duplicate record is found. If the streaming job has a considerable number of duplicate rows, this split and insert process has to ignore the duplicates one by one, which is less efficient and time-consuming. If you see multiple key violation warning messages in your Activity log within the past hour, it’s likely that your SQL output is slowing down the entire job.
+When you set up unique key constraints on the SQL table, Azure Stream Analytics removes duplicate records. It splits the data into batches and recursively inserts the batches until a single duplicate record is found. The split and insert process ignores the duplicates one at a time. For a streaming job that has many duplicate rows, the process is inefficient and time-consuming. If you see multiple key violation warning messages in your Activity log for the previous hour, it's likely that your SQL output is slowing down the entire job.
 
-To resolve this issue, you should [configure the index]( https://docs.microsoft.com/sql/t-sql/statements/create-index-transact-sql) that is causing the key violation by enabling the IGNORE_DUP_KEY option. Enabling this option allows duplicate values to be ignored by SQL during bulk inserts and SQL Azure simply produces a warning message instead of an error. Azure Stream Analytics does not produce primary key violation errors anymore.
+To resolve this issue, [configure the index]( https://docs.microsoft.com/sql/t-sql/statements/create-index-transact-sql) that's causing the key violation by enabling the IGNORE_DUP_KEY option. This option allows SQL to ignore duplicate values during bulk inserts. Azure SQL Database simply produces a warning message instead of an error. As a result, Azure Stream Analytics no longer produces primary key violation errors.
 
 Note the following observations when configuring IGNORE_DUP_KEY for several types of indexes:
 
-* You cannot set IGNORE_DUP_KEY on a primary key or a unique constraint that uses ALTER INDEX, you need to drop and recreate the index.  
-* You can set the IGNORE_DUP_KEY option using ALTER INDEX for a unique index, which is different from PRIMARY KEY/UNIQUE constraint and created using CREATE INDEX or INDEX definition.  
+* You can't set IGNORE_DUP_KEY on a primary key or a unique constraint that uses ALTER INDEX. You need to drop the index and recreate it.  
+* You can set IGNORE_DUP_KEY by using ALTER INDEX for a unique index. This instance is different from a PRIMARY KEY/UNIQUE constraint and is created by using a CREATE INDEX or INDEX definition.  
+* The IGNORE_DUP_KEY option doesn't apply to column store indexes because you can't enforce uniqueness on them.  
 
-* IGNORE_DUP_KEY doesn’t apply to column store indexes because you can’t enforce uniqueness on such indexes.  
+## Column names are lowercase in Azure Stream Analytics (1.0)
 
-## Column names are lower-cased by Azure Stream Analytics
-When using the original compatibility level (1.0), Azure Stream Analytics used to change column names to lower case. This behavior was fixed in later compatibility levels. In order to preserve the case, we advise customers to move to the compatibility level 1.1 and later. You can find more information on [Compatibility level for Azure Stream Analytics jobs](https://docs.microsoft.com/azure/stream-analytics/stream-analytics-compatibility-level).
+When using the original compatibility level (1.0), Azure Stream Analytics changes column names to lowercase. This behavior was fixed in later compatibility levels. To preserve the case, move to compatibility level 1.1 or later. For more information, see [Compatibility level for Stream Analytics jobs](https://docs.microsoft.com/azure/stream-analytics/stream-analytics-compatibility-level).
 
 ## Get help
 
@@ -91,5 +91,5 @@ For further assistance, try our [Azure Stream Analytics forum](https://social.ms
 * [Introduction to Azure Stream Analytics](stream-analytics-introduction.md)
 * [Get started using Azure Stream Analytics](stream-analytics-real-time-fraud-detection.md)
 * [Scale Azure Stream Analytics jobs](stream-analytics-scale-jobs.md)
-* [Azure Stream Analytics Query Language Reference](https://docs.microsoft.com/stream-analytics-query/stream-analytics-query-language-reference)
-* [Azure Stream Analytics Management REST API Reference](https://msdn.microsoft.com/library/azure/dn835031.aspx)
+* [Azure Stream Analytics Query Language reference](https://docs.microsoft.com/stream-analytics-query/stream-analytics-query-language-reference)
+* [Azure Stream Analytics management REST API reference](https://msdn.microsoft.com/library/azure/dn835031.aspx)