This is the artifactrepository where your compiled binaries, scripts, and executables can be stored for later consumption. This replaces the need for package managers, which teams generally manage on their own, although they sometimes opt for a remote-hosted offering. Out-of-the-box compatibility with PyPI, Maven, NPM, and so on makes it easy to store your artifacts […]
Category: AWS and DevOps
Summary – Troubleshoot Data Storage ProcessingSummary – Troubleshoot Data Storage Processing
This chapter discussed numerous aspects for optimizing data storage, data processing, and pipelines. From a data storage perspective, you learned how data skewing, data spills, and shuffling have a negative impact on the storage and usability of your data. Using a command like PDW_SHOWSPACEUSED to show how data is stored across distribution is a way […]
Handle Interruptions – Troubleshoot Data Storage ProcessingHandle Interruptions – Troubleshoot Data Storage Processing
An interruption to the processing of your data stream flowing through your Azure Stream Analytics job can occur in many forms. One of the most catastrophic examples is caused by an event such as a storm or other event that results in the closure of all datacenters in a given Azure region. Although these events […]
Monitor Batches and Pipelines – Troubleshoot Data Storage ProcessingMonitor Batches and Pipelines – Troubleshoot Data Storage Processing
This section is a follow‐up to Chapter 6. It is placed here so that you can recall the content reading about logging, monitoring, optimizing, and troubleshooting techniques in this chapter and Chapter 9. Handle Failed Batch Loads There are many actions you can take within the Azure Batch job itself from a coding perspective. In […]
Scale Resources – Troubleshoot Data Storage ProcessingScale Resources – Troubleshoot Data Storage Processing
Figure 6.6 shows the select node size when you provisioned your Azure Batch pool. Notice that the Mode toggle switch is set to Fixed, with a targeted dedicated nodes value of 2. This means the amount of compute capacity allocated to this pool is fixed and will not scale. If the utilization of the allocated […]
Troubleshoot a Failed Pipeline Run – Troubleshoot Data Storage Processing-4Troubleshoot a Failed Pipeline Run – Troubleshoot Data Storage Processing-4
Exceptions will have major impact on performance, even if handled, so you should log them, set up alerts when they happen, and work toward avoiding them all together. Chapter 6, “Create and Manage Batch Processing and Pipelines,” introduced the different execution paths (aka conditions) that can be taken between pipeline activities. As shown in Figure […]
Troubleshoot a Failed Pipeline Run – Troubleshoot Data Storage Processing-3Troubleshoot a Failed Pipeline Run – Troubleshoot Data Storage Processing-3
The pop‐out window enables you to select the integration runtime to use for the pipeline execution. Data flows often perform very large ingestion and transformational activities, and this additional amount of compute power is required to process them. The default amount of time to keep the IR active is 1 hour, but if you need […]
Troubleshoot a Failed Pipeline Run – Troubleshoot Data Storage Processing-2Troubleshoot a Failed Pipeline Run – Troubleshoot Data Storage Processing-2
The last topic to mention in the context of slowness has to do with overutilized compute resources. This is one of the most common scenarios you will encounter. The metrics you configure to monitor the health of your data analytics pipeline should target this specifically. When those metrics show that compute resources are under pressure, […]
Troubleshoot a Failed Pipeline Run – Troubleshoot Data Storage Processing-1Troubleshoot a Failed Pipeline Run – Troubleshoot Data Storage Processing-1
It is almost certain that at some point while running your data analytics procedures, something unexpected will happen. When it does, you must gather information like the symptoms experienced and the log files that will help you get down to the reason for the behavior. Knowing what you read in the last section about the […]
Troubleshoot a Failed Spark Job – Troubleshoot Data Storage ProcessingTroubleshoot a Failed Spark Job – Troubleshoot Data Storage Processing
It is inherently difficult to train or document how to troubleshoot technical problems because of the wide variety of symptoms one is exposed to. That means when an example is used to teach troubleshooting, it will most likely not be one that the person being trained will experience. Instead, there are two points to make […]