This section is a follow‐up to Chapter 6. It is placed here so that you can recall the content reading about logging, monitoring, optimizing, and troubleshooting techniques in this chapter and Chapter 9.
Handle Failed Batch Loads
There are many actions you can take within the Azure Batch job itself from a coding perspective. In every case, implement try/catch() statements around the source code that performs the data transformation. The other options are adding a Fail activity, which will result in good error messages to the support team, retrying the batch job execution, and/or adding failure dependencies to the Custom activity in an additional attempt to recover from the failed job. It is important to consider the implications of failed batch jobs that are responsible for loading data. Refer to the following sections in Chapter 6, which identify how to handle failed batch loads:
- “Handle Duplicate Data”
- “Handle Missing Data”
- “Handle Late‐Arriving Data”
- “Regression to a Previous State”
- “Validate Batch Loads”
The way in which you handle a batch load failure depends greatly on what the load is doing and the downstream dependencies of the data.
Design and Develop a Stream Processing Solution
The following sections contain some additional optimization and troubleshooting techniques that build on top of the logging, monitoring, and debugging techniques discussed in this chapter and in Chapter 9.
Optimize Pipelines for Analytical or Transactional Purposes
This section builds on the content in Chapter 7, “Design and Implement a Data Stream Processing Solution,” about scaling. In this chapter and in Chapter 9, you learned a bit more about monitoring, error handling, and troubleshooting capabilities. From an optimization perspective, the most effective way to achieve optimal performance with an Azure Stream Analytics job is by parallelization. Table 7.6 introduced the types of input and output partitioning, which are the key enablers for parallelization. The following sections in Chapter 7 explain how input and output partitioning work:
- “Process with One Partition”
- “Process Across Partitions”
To get maximum throughput, the number of partitions of the incoming stream must equal the number of partitions of the outgoing stream (refer to Figure 7.31). Remember that when your job is running in compatibility level 1.2 or greater, the platform automatically provides a partition key; otherwise, you should use the PARTITION BY clause in your query, similar to the following:
SELECT *
INTO Output
FROM Input PARTITION BY pkBrainwavesPOW
From a troubleshooting perspective, the first place to start is with performance and availability metrics. Chapter 7 introduced the available diagnostic settings, which, when configured, store metrics, execution, and authoring Azure Stream Analytics data into an Azure Monitor Log analytics cluster (refer to Figure 7.50). The data stored in the cluster can be queried using KQL to locate components of the job that are performing slowly or encountering errors. Once those points are identified, they can be analyzed further to find the reason for the problem. Then you can engage the team and come up with an approach to resolve it.
In addition to diagnostic settings, you learned about the activity logs, which can show failed operations (refer to Figure 7.51). Configuring an alert rule when operations return a status of Failed is good practice for maintaining the reliability of your solution. The option to create this alert rule is available directly on the Activity log blade, as shown in Figure 7.51. In Chapter 9, Table 9.9 exposed you to all the available Azure Stream Analytics metrics and their meanings. There was also a discussion around using numerous metrics together, which led to a better understanding of performance‐ or error‐related issues. When used together, the Backlogged Input Events, Watermark Delay, and CPU % Utilization options are very helpful for determining if there is a compute capacity issue. Figure 9.18 and Figure 9.19 are examples of the metrics capability for an Azure Stream Analytics job. You can further optimize your Azure Stream Analytics job by handling interruptions and scaling resources.