Azure for Architects
上QQ阅读APP看书,第一时间看更新

Scalability

Running applications and systems that are available to users for consumption is important for architects of any business-critical application. However, there is another equally important application feature that is one of the top priorities for architects, and this is the scalability of the application.

Imagine a situation in which an application is deployed and obtains great performance and availability with a few users, but both availability and performance decrease as the number of users begin to increase. There are times when an application performs well under a normal load, but suffers a drop in performance with an increase in the number of users. This can happen if there is a sudden increase in the number of users and the environment is not built for such a large number of users.

To accommodate such spikes in the number of users, you might provision the hardware and bandwidth for handling spikes. The challenge with this is that the additional capacity is not used for the majority of the year, and so does not provide any return on investment. It is provisioned for use only during the holiday season or sales. I hope that by now you are becoming familiar with the problems that architects are trying to solve. All these problems are related to capacity sizing and the scalability of an application. The focus of this chapter is to understand scalability as an architectural concern and to check out the services that are provided by Azure for implementing scalability.

Capacity planning and sizing are a couple of the top priorities for architects and their applications and services. Architects must find a balance between buying and provisioning too many resources and buying and provisioning too few resources. Having too few resources can lead to you not being able to serve all users, resulting in them turning to a competitor. On the other hand, having too many resources can hurt your budget and return on investment because most of the resources will remain unused most of the time. Moreover, the problem is amplified by the varying level of demand at different times. It is almost impossible to predict the number of users of an application over a day, let alone a year. However, it is possible to find an approximate number using past information and continuous monitoring.

Scalability refers to the ability to handle a growing number of users and provide them with the same level of performance as when there are fewer users utilizing resources for application deployment, processes, and technology. Scalability might mean serving more requests without a decrease in performance, or it might mean handling larger and more time-consuming work without any loss of performance in both cases.

Capacity planning and sizing exercises should be undertaken by architects at the very beginning of a project and during the planning phase to provide scalability to applications.

Some applications have stable demand patterns, while it is difficult to predict others. Scalability requirements are known for stable-demand applications, while discerning them can be a more involved process for variable-demand applications. Autoscaling, a concept that we will review in the next section, should be used for applications whose demands cannot be predicted.

People often tend to confuse scalability with performance. In the next section, you will see a quick comparison of these two terms.

Scalability versus performance

It is quite easy to get confused between scalability and performance when it comes to architectural concerns, because scalability is all about ensuring that irrespective of the number of users consuming the application, all users receive the same predetermined level of performance.

Performance relates to ensuring that an application caters to predefined response times and throughput. Scalability refers to having provisions for more resources when needed in order to accommodate more users without sacrificing performance.

It is better to understand this using an analogy: the speed of a train directly relates to the performance of a railway network. However, getting more trains to run in parallel at the same or at higher speeds represents the scalability of the railway network.

Now that you know what the difference between scalability and performance is, let's discuss how Azure provides scalability.

Azure scalability

In this section, we will look at the features and capabilities provided by Azure to make applications highly available. Before we get into the architecture and configuration details, it is important to understand Azure's high-availability concepts, in other words, scaling.

Scaling refers to either increasing or decreasing the amount of resources that are used to serve requests from users. Scaling can be automatic or manual. Manual scaling requires an administrator to manually initiate the scaling process, while automatic scaling refers to an automatic increase or decrease in resources based on the events available from the environment and ecosystem, such as memory and CPU availability. Resources can be scaled up or down, or out and in, which will be explained later in this section.

In addition to rolling updates, the fundamental constructs provided by Azure to achieve high availability are as follows:

Scaling up and down

Scaling out and in

Autoscaling

Scaling up

Scaling a VM or service up entails the addition of further resources to existing servers, such as CPU, memory, and disks. It aims to increase the capacity of existing physical hardware and resources.

Scaling down

Scaling a VM or service down entails the removal of existing resources from existing servers, such as CPU, memory, and disks. It aims to decrease the capacity of existing physical and virtual hardware and resources.

Scaling out

Scaling out entails adding further hardware, such as additional servers and capacity. This typically involves adding new servers, assigning them IP addresses, deploying applications on them, and making them part of the existing load balancers such that traffic can be routed to them. Scaling out can be automatic or manual as well. However, for better results, automation should be used:

Scaling out
Figure 2.8: Scaling out

Scaling in

Scaling in refers to the process of removing the existing hardware in terms of existing servers and capacity. This typically involves removing existing servers, deallocating their IP addresses, and removing them from the existing load balancer configuration such that traffic cannot be routed to them. Like scaling out, scaling in can be automatic or manual.

Autoscaling

Autoscaling refers to the process of either scaling up/down or scaling out/in dynamically based on application demand, and this happens using automation. Autoscaling is useful because it ensures that a deployment always consists of an ideal number of server instances. Autoscaling helps in building applications that are fault tolerant. It not only supports scalability, but also makes applications highly available. Finally, it provides the best cost management. Autoscaling makes it possible to have the optimal configuration for server instances based on demand. It helps in not over-provisioning servers, only for them to end up being underutilized, and removes servers that are no longer required after scaling out.

So far, we've discussed scalability in Azure. Azure offers scalability options for most of its services. Let's explore scalability for PaaS in Azure in the next section.

PaaS scalability

Azure provides App Service for hosting managed applications. App Service is a PaaS offering from Azure. It provides services for the web and mobile platforms. Behind the web and mobile platforms is a managed infrastructure that is managed by Azure on behalf of its users. Users do not see or manage any infrastructure; however, they have the ability to extend the platform and deploy their applications on top of it. In doing so, architects and developers can concentrate on their business problems instead of worrying about the base platform and infrastructure provisioning, configuration, and troubleshooting. Developers have the flexibility to choose any language, OS, and framework to develop their applications. App Service provides multiple plans and, based on the plans chosen, various degrees of scalability are available. App Service provides the following five plans:

Free: This uses shared infrastructure. It means that multiple applications will be deployed on the same infrastructure from the same or multiple tenants. It provides 1 GB of storage free of charge. However, there is no scaling facility in this plan.

Shared: This also uses shared infrastructure and provides 1 GB of storage free of charge. Additionally, custom domains are also provided as an extra feature. However, there is no scaling facility in this plan.

Basic: This has three different stock keeping units (SKUs): B1, B2, and B3. They each have increasing units of resources available to them in terms of CPU and memory. In short, they provide improved configuration of the VMs backing these services. Additionally, they provide storage, custom domains, and SSL support. The basic plan provides basic features for manual scaling. There is no autoscaling available in this plan. A maximum of three instances can be used to scale out an application.

Standard: This also has three different SKUs: S1, S2, and S3. They each have increasing units of resources available to them in terms of CPU and memory. In short, they provide improved configuration of the VMs backing these services. Additionally, they provide storage, custom domains, and SSL support that is similar to that of the basic plan. This plan also provides a Traffic Manager instance, staging slots, and one daily backup as an additional feature on top of the basic plan. The standard plan provides features for automatic scaling. A maximum of 10 instances can be used to scale out the application.

Premium: This also has three different SKUs: P1, P2, and P3. They each have increasing units of resources available to them in terms of CPU and memory. In short, they provide improved configuration of the VMs backing these services. Additionally, they provide storage, custom domains, and SSL support that is similar to the basic plan. This plan also provides a Traffic Manager instance, staging slots, and 50 daily backups as an additional feature on top of the basic plan. The standard plan provides features for autoscaling. A maximum of 20 instances can be used to scale out the application.

We have explored the scalability tiers available for PaaS services. Now, let's see how scaling can be done in the case of an App Service plan.

PaaS – scaling up and down

Scaling up and down services that are hosted by App Service is quite simple. The Azure app services Scale Up menu opens a new pane with all plans and their SKUs listed. Choosing a plan and SKU will scale a service up or down, as shown in Figure 2.9:

Different plans with their SKUs
Figure 2.9: Different plans with their SKUs

PaaS – scaling out and in

Scaling out and in services hosted in App Service is also quite simple. The Azure app services Scale Out menu item opens a new pane with scaling configuration options.

By default, autoscaling is disabled for both premium and standard plans. It can be enabled using the Scale Out menu item and by clicking on the Enable autoscale button, as shown in Figure 2.10:

Enabling autoscale in scale out
Figure 2.10: Enabling the autoscale option

Manual scaling does not require configuration, but autoscaling helps in configuring with the aid of the following properties:

Mode of scaling: This is based on a performance metric such as CPU or memory usage, or users can simply specify a number of instances for scaling.

When to scale: Multiple rules can be added that determine when to scale out and in. Each rule can determine criteria such as CPU or memory consumption, whether to increase or decrease the number of instances, and how many instances to increase or decrease to at a time. At least one rule for scaling out and one rule for scaling in should be configured. Threshold definitions help in defining the upper and lower limits that should trigger the autoscale—by either increasing or decreasing the number of instances.

How to scale: This specifies how many instances to create or remove in each scale-out or scale-in operation:

Setting the instance limits for scalling
Figure 2.11: Setting the instance limits

This is quite a good feature to enable in any deployment. However, you should enable both scaling out and scaling in together to ensure that your environment is back to normal capacity after scaling out.

Since we have covered the scalability in PaaS, let's move on and discuss scalability in IaaS next.

IaaS scalability

There are users who will want to have complete control over their base infrastructure, platform, and application. They will prefer to consume IaaS solutions rather than PaaS solutions. When such customers create VMs, they are also responsible for capacity sizing and scaling. There is no out-of-the-box configuration for manually scaling or autoscaling VMs. These customers will have to write their own automation scripts, triggers, and rules to achieve autoscaling. With VMs comes the responsibility of maintaining them. The patching, updating, and upgrading of VMs is the responsibility of owners. Architects should think about both planned and unplanned maintenance. How these VMs should be patched, the order, grouping, and other factors must be considered to ensure that neither the scalability nor the availability of an application is compromised. To help alleviate such problems, Azure provides VM scale sets (VMSS) as a solution, which we will discuss next.