Category: Cloud skills and resources

Scale Resources – Troubleshoot Data Storage ProcessingScale Resources – Troubleshoot Data Storage Processing

Figure 6.6 shows the select node size when you provisioned your Azure Batch pool. Notice that the Mode toggle switch is set to Fixed, with a targeted dedicated nodes value of 2. This means the amount of compute capacity allocated to this pool is fixed and will not scale. If the utilization of the allocated […]

Rewrite User‐Defined Functions – Troubleshoot Data Storage ProcessingRewrite User‐Defined Functions – Troubleshoot Data Storage Processing

The description of user‐defined functions (UDF) in Chapter 2 is very informative, so have a look back at it if you need a refresher. In general terms, a UDF is a code snippet that performs some action on your data. These code snippets are most commonly triggered using the method name from within either SQL […]

Troubleshoot a Failed Pipeline Run – Troubleshoot Data Storage Processing-2Troubleshoot a Failed Pipeline Run – Troubleshoot Data Storage Processing-2

The last topic to mention in the context of slowness has to do with overutilized compute resources. This is one of the most common scenarios you will encounter. The metrics you configure to monitor the health of your data analytics pipeline should target this specifically. When those metrics show that compute resources are under pressure, […]

Troubleshoot a Failed Pipeline Run – Troubleshoot Data Storage Processing-1Troubleshoot a Failed Pipeline Run – Troubleshoot Data Storage Processing-1

It is almost certain that at some point while running your data analytics procedures, something unexpected will happen. When it does, you must gather information like the symptoms experienced and the log files that will help you get down to the reason for the behavior. Knowing what you read in the last section about the […]

Troubleshoot a Failed Spark Job – Troubleshoot Data Storage ProcessingTroubleshoot a Failed Spark Job – Troubleshoot Data Storage Processing

It is inherently difficult to train or document how to troubleshoot technical problems because of the wide variety of symptoms one is exposed to. That means when an example is used to teach troubleshooting, it will most likely not be one that the person being trained will experience. Instead, there are two points to make […]

Optimize Pipeline for Descriptive versus Analytical Workloads – Troubleshoot Data Storage ProcessingOptimize Pipeline for Descriptive versus Analytical Workloads – Troubleshoot Data Storage Processing

The “Analytics Types” section in Chapter 2 described the numerous categories of data analytics—descriptive, diagnostic, predictive, preemptive, and prescriptive—each of which is an analytical workload. This is concluded by what you learned in the previous section: that OLTP operations are transactional, and OLAP operations are analytical. With the review of those five data analytics types, […]