IT tutorials

Windows Server 2012 : Scalable and elastic web platform (part 1) - NUMA-aware scalability

- How To Install Windows Server 2012 On VirtualBox
- How To Bypass Torrent Connection Blocking By Your ISP
- How To Install Actual Facebook App On Kindle Fire
3/15/2014 1:59:52 AM

Web hosting platforms like IIS are the foundation for cloud computing, and they need both scalability and elasticity to be effective. A platform has scalability if it allows additional resources such as processing power, memory, or storage to be provisioned to meet increasing demand. For example, if users of applications running on your web server farm are complaining about delays and slow performance, you may need to add more servers to your farm to scale outward. Or you might upgrade your existing servers by adding more memory to scale them upward. Elasticity, on the other hand, means allowing such additional resources to be provisioned automatically on demand.

Whether you are an enterprise hosting line of business (LoB) applications or a cloud hosting provider managing a multi-tenant public cloud, IIS 8 in Windows Server 2012 can enhance both the scalability and elasticity of your hosting environment. IIS 8 provides increased scale through improved Secure Sockets Layer (SSL) scalability, better manageability via centralized SSL certificate support, Non-Uniform Memory Access (NUMA)–aware scalability to provide greater performance on cutting-edge hardware, and other new features and enhancements.

1. NUMA-aware scalability

High-end server hardware is rapidly evolving. Powerful servers that are too expensive today for many smaller businesses to acquire will soon be commonplace.

NUMA, which until recently was available only on high-end server hardware, will probably be a standard feature of commodity servers within the next two years. NUMA was designed to overcome the scalability limits of the traditional symmetric multi-processing (SMP) architecture, where all memory access happens on the same shared memory bus. SMP works well when you have a small number of CPUs, but it doesn’t when you have dozens of them competing for access to the shared bus. NUMA alleviates such bottlenecks by limiting how many CPUs can be on any one memory bus and connecting them with a high-speed interconnection.

Understanding NUMA-aware scalability

A significant percentage of recent server hardware has NUMA architecture. These machines use multiple bus systems, one for each socket. Each socket has multiple CPUs and its own memory. A socket with the attached memory and I/O system comprises a NUMA node. Accessing data that is located in a different NUMA node is more expensive than accessing memory on the local node. When we tested IIS 7.5 on NUMA hardware, we noticed that an increasing number of CPU cores did not result in increased performance beyond a certain number of cores. In fact, the performance actually degraded for certain scenarios. This was happening because the process scheduling is not NUMA-aware, and because of that, the cost of memory synchronization on NUMA hardware outweighed the benefits of additional cores. The goal behind the NUMA-Aware Scalability feature is to ensure that IIS 8 can take advantage of modern NUMA hardware and provide optimal performance on servers with a high number of CPU cores.

To get the best performance on NUMA hardware for a web workload, a Hypertext Transfer Protocol (HTTP) request packet should traverse through the fastest I/O path to the CPU. This also means that the packet should be served by a CPU socket, which is the same I/O hub as the network interface card (NIC) receiving the packet. This configuration is very specific to hardware architecture, and there is no programmatic way to know which NIC and sockets are on the same I/O hub.

One of the design goals of this feature is to provide near-optimal settings out of the box without much user configuration. Understanding the finer details of NUMA hardware (for example, the hardware schematic, NIC, and CPU layout) and configuring it correctly can be pretty difficult and time consuming for average users. So IIS 8 tries its best to configure all these settings automatically.

Automatic configuration is convenient, but it can’t beat optimally tuned hardware performance. To enable best performance, advanced users can affinitize an IIS worker process to most optimal NUMA core(s). This can be done by manually configuring the smpProcessorAffinityMask attribute in the IIS configuration. This provides something called “hard affinity.” When this configuration is used, the application pools are hard-affinitized, meaning that there is no spillover to other NUMA nodes. More explicitly, the threads cannot be executed by other cores on the system, regardless of whether other cores have extra CPU cycles or not.

For average users, Windows and IIS make the best attempt at offering automatic configurations that should yield the best performance. For automatic configuration, IIS uses something called “soft affinity.” In soft affinity, when a process is affinitized to a core, the affinitized core is identified as the “preferred core.” When a thread is about to be scheduled to be executed, the preferred core is considered first. However, depending on the load and the availability of other cores on the system, the thread may be scheduled on other cores on the system. In lab tests, it was observed that soft affinity is more forgiving in the case of misconfiguration compared to hard affinity.

When a system has multiple NUMA nodes, Windows uses a simple round-robin algorithm to assign processes between NUMA nodes to make sure that loads get distributed equally across nodes. This does not work best for IIS workloads because they are usually memory-constrained. IIS is aware of the memory consumption by each NUMA, so IIS 8.0 will enable another scheduling algorithm for worker processes started by the Windows Process Activation Service (WAS), which will schedule the processes on the node with the most available memory. This helps in minimizing access to memory on remote NUMA node. This capability is called Most Available Memory, and is the default process scheduling algorithm on NUMA hardware for automatically picking optimal NUMA node for the process.

Process scheduling and performance also depends on how IIS workload has been partitioned. As explained next, IIS supports two ways of partitioning the workload.

Run multiple worker processes in one application pool (that is, a web garden)

If you are using this mode, by default, the application pool is configured to run one worker process. For maximum performance, you should consider running the same number of worker processes as there are NUMA nodes, so that there is 1:1 affinity between the worker processes and NUMA nodes. This can be done by setting the Maximum Worker Processes application pool setting to 0. In this setting, IIS determines how many NUMA nodes are available on the hardware and starts the same number of worker processes.

Run multiple applications pools in single workload/site

In this configuration, the workload/site is divided into multiple application pools. For example, the site may contain several applications that are configured to run in separate application pools. This configuration effectively results in running multiple IIS worker processes for the workload/site, and IIS intelligently distributes and affinitizes the processes for maximum performance.

Harsh Mittal, Senior Program Manager

Eok Kim, Software Design Engineer

Aniello Scotto Di Marco, Software Design Engineer in Test

Microsoft Internet Information Services Team

How NUMA-aware scalability works

NUMA-aware scalability works by intelligently affinitizing worker processes to NUMA nodes. For example, let’s say that you have a large enterprise web application that you want to deploy on an IIS 8 web garden. A web garden is an application pool that uses more than one worker process. The number of worker processes used by an application pool can be configured in the Advanced Settings dialog box of an application pool, and as Figure 1 shows, the out-of-the-box configuration for IIS is to assign one worker process to each application pool.

Configuring a web garden on IIS 8.

Figure 1. Configuring a web garden on IIS 8.

By increasing the Maximum Worker Processes setting over its default value of 1, you change the website associated with your application into a web garden. On NUMA-aware hardware, the result is that IIS will try to assign each worker process in the web garden to a different NUMA node. This manual affinity approach allows IIS 8 to support NUMA-capable systems with more than 64 logical cores. You can also use this approach on NUMA-capable systems with fewer than 64 logical cores if you want to try and custom-tune your workload.

On NUMA-capable systems with fewer than 64 logical cores, however, you can simply set Maximum Worker Processes to 0, in which case IIS will start as many worker processes as there are NUMA nodes on the system to achieve optimal performance. You might use this approach, for example, if you are a multi-tenant cloud hosting provider.

Benefits of NUMA-aware scalability

Internal testing by Microsoft has demonstrated the benefits that enterprises and cloud hosting providers can gain from implementing IIS 8 in their datacenters. For example, in a series of tests using the default IIS configuration of one worker process per application pool, the number of requests per second that could be handled by a web application actually decreased by about 20 percent as one goes from 32 to 64 cores on systems that are not NUMA-capable because of increased contention for the shared memory bus on such systems. In similar tests on NUMA-capable systems, however, the number of requests per second that could be handled increased by more than 50 percent as one goes from 32 to 64 cores. Such testing confirms the increased scalability that IIS 8 provides through its NUMA-aware capabilities.

- Setting Up Windows 8 Family Safety (part 7) - Viewing Family Safety Online Reports
- Setting Up Windows 8 Family Safety (part 6) - Viewing Family Safety Activity Reports
- Setting Up Windows 8 Family Safety (part 5) - Blocking and allowing Apps
- Setting Up Windows 8 Family Safety (part 4) - Controlling Windows Store and game play
- Setting Up Windows 8 Family Safety (part 3) - Setting time limits
- Setting Up Windows 8 Family Safety (part 2) - Setting Web Filtering
- Setting Up Windows 8 Family Safety (part 1) - Getting to the Family Safety page
- Windows 8 : Sharing and Securing with User Accounts - Managing Profile Properties and Environment Variables
- Windows 8 : Sharing and Securing with User Accounts - Using Credential Manager
- Windows 8 : Sharing and Securing with User Accounts - Add the Built-in Administrator Account to the Login Screen , Stop Entering Password on Lockout
Top 10
- Microsoft Visio 2013 : Adding Structure to Your Diagrams - Finding containers and lists in Visio (part 2) - Wireframes,Legends
- Microsoft Visio 2013 : Adding Structure to Your Diagrams - Finding containers and lists in Visio (part 1) - Swimlanes
- Microsoft Visio 2013 : Adding Structure to Your Diagrams - Formatting and sizing lists
- Microsoft Visio 2013 : Adding Structure to Your Diagrams - Adding shapes to lists
- Microsoft Visio 2013 : Adding Structure to Your Diagrams - Sizing containers
- Microsoft Access 2010 : Control Properties and Why to Use Them (part 3) - The Other Properties of a Control
- Microsoft Access 2010 : Control Properties and Why to Use Them (part 2) - The Data Properties of a Control
- Microsoft Access 2010 : Control Properties and Why to Use Them (part 1) - The Format Properties of a Control
- Microsoft Access 2010 : Form Properties and Why Should You Use Them - Working with the Properties Window
- Microsoft Visio 2013 : Using the Organization Chart Wizard with new data
programming4us programming4us
Popular tags
Video Tutorail Microsoft Access Microsoft Excel Microsoft OneNote Microsoft PowerPoint Microsoft Project Microsoft Visio Microsoft Word Active Directory Biztalk Exchange Server Microsoft LynC Server Microsoft Dynamic Sharepoint Sql Server Windows Server 2008 Windows Server 2012 Windows 7 Windows 8 Adobe Indesign Adobe Flash Professional Dreamweaver Adobe Illustrator Adobe After Effects Adobe Photoshop Adobe Fireworks Adobe Flash Catalyst Corel Painter X CorelDRAW X5 CorelDraw 10 QuarkXPress 8 windows Phone 7 windows Phone 8 BlackBerry Android Ipad Iphone iOS