SQL Server 2012 : Executing Your Queries (part 2) - SQLOS - CPU Nodes, Schedulers, Tasks, Workers, Threads

2/19/2014 7:57:53 PM

Soft NUMA

In some scenarios, you may be able to work with an SMP server and still get the benefit of having a NUMA-type structure with SQL Server. You can achieve this by using soft NUMA. This enables you to use Registry settings to tell SQL Server that it should configure itself as a NUMA system, using the CPU-to-memory-node mapping that you specify.

As with anything that requires Registry changes, you need to take exceptional care, and be sure you have backup and rollback options at every step of the process.

One common use for soft NUMA is when a SQL Server is hosting an application that has several different groups of users with very different query requirements. After configuring your theoretical 16-processor server for soft NUMA, assigning 2 NUMA nodes with 4 CPUs , and one 8-CPU node to a third NUMA node, you would next configure connection affinity for the three nodes to different ports, and then change the connection settings for each class of workload, so that workload A is “affinitized” to port x, which connects to the first NUMA node; workload B is affinitized to port y, which connects to the second NUMA node; and all other workloads are affinitized to port z, which is set to connect to the third NUMA node.

CPU Nodes

A CPU node is a logical collection of CPUs that share some common resource, such as a cache or memory. CPU nodes live below memory nodes in the SQLOS object hierarchy.

Whereas a memory node may have one or more CPU nodes associated with it, a CPU node can be associated with only a single memory node. However, in practice, nearly all configurations have a 1:1 relationship between memory nodes and CPU nodes.

CPU nodes can be seen in the DMV sys.dm_os_nodes. Use the following query to return select columns from this DMV:

select node_id, node_state_desc, memory_node_id, cpu_affinity_mask
from sys.dm_os_nodes

The results from this query, when run on a single-CPU system are as follows:

NODE_ID NODE_STATE_DESC MEMORY_NODE_ID CPU_AFFINITY_MASK
0       ONLINE          0              1
32      ONLINE DAC      0              0

The results from the previous query, when run on a 96-processor NUMA system, comprising four nodes of four sockets, each socket with six cores, totaling 24 cores per NUMA node, and 96 cores across the whole server, are as follows:

NODE_ID NODE_STATE_DESC MEMORY_NODE_ID CPU_AFFINITY_MASK
0       ONLINE          1              16777215
1       ONLINE          0              281474959933440
2       ONLINE          2              16777215
3       ONLINE          3              281474959933440
64      ONLINE DAC      0              16777215

NOTE

The hex values for the cpu_affinity_mask values in this table are as follows:

16777215 = 0x00FFFFFF
281474959933440 = 0x0F000001000000FFFFFF0000

This indicates which processor cores each CPU node can use.

Processor Affinity

CPU affinity is a way to force a workload to use specific CPUs. It’s another way that you can affect scheduling and SQL Server SQLOS configuration.

CPU affinity can be managed at several levels. Outside SQL Server, you can use the operating system’s CPU affinity settings to restrict the CPUs that SQL Server as a process can use. Within SQL Server’s configuration settings, you can specify that SQL Server should use only certain CPUs. This is done using the affinity mask and affinity64 mask configuration options. Changes to these two options are applied dynamically, which means that schedulers on CPUs that are either enabled or disabled while SQL is running will be affected immediately. Schedulers associated with CPUs that are disabled will be drained and set to offline. Schedulers associated with CPUs that are enabled will be set to online, and will be available for scheduling workers and executing new tasks.

You can also set SQL Server I/O affinity using the affinity I/O mask option. This option enables you to force any I/O-related activities to run only on a specified set of CPUs. Using connection affinity as described earlier in the section “Soft NUMA,” you can affinitize network connections to a specific memory node.

Schedulers

The scheduler node is where the work of scheduling activity occurs. Scheduling occurs against tasks, which are the requests to do some work handled by the scheduler. One task may be the optimized query plan that represents the T-SQL you want to execute; or, in the case of a batch with multiple T-SQL statements, the task would represent a single optimized query from within the larger batch.

When SQL Server starts up, it creates one scheduler for each CPU that it finds on the server, and some additional schedulers to run other system tasks. If processor affinity is set such that some CPUs are not enabled for this instance, then the schedulers associated with those CPUs will be set to a disabled state. This enables SQL Server to support dynamic affinity settings.

While there is one scheduler per CPU, schedulers are not bound to a specific CPU, except in the case where CPU affinity has been set.

Each scheduler is identified by its own unique scheduler_id. Values from 0–254 are reserved for schedulers running user requests. Scheduler_id 255 is reserved for the scheduler for the dedicated administrator connection (DAC). Schedulers with a scheduler_id > 255 are reserved for system use and are typically assigned the same task.

The following code sample shows select columns from the DMV sys.dm_os_schedulers:

select parent_node_id, scheduler_id, cpu_id, status, scheduler_address
from sys.dm_os_schedulers
order by scheduler_id

The following results from the preceding query indicate that scheduler_id 0 is the only scheduler with an id < 255, which implies that these results came from a single-core machine. You can also see a scheduler with an ID of 255, which has a status of VISIBLE ONLINE (DAC), indicating that this is the scheduler for the DAC. Also shown are three additional schedulers with IDs > 255. These are the schedulers reserved for system use.

PARENT_NODE_ID SCHEDULER_ID CPU_ID STATUS               SCHEDULER_ADDRESS
0              0            0      VISIBLE ONLINE       0x00480040
32             255          0      VISIBLE ONLINE (DAC) 0x03792040
0              257          0      HIDDEN ONLINE        0x006A4040
0              258          0      HIDDEN ONLINE        0x64260040
0              259          0      HIDDEN ONLINE        0x642F0040

Tasks

A task is a request to do some unit of work. The task itself doesn’t do anything, as it’s just a container for the unit of work to be done. To actually do something, the task has to be scheduled by one of the schedulers, and associated with a particular worker. It’s the worker that actually does something, and you will learn about workers in the next section.

Tasks can be seen using the DMV sys.dm_os_tasks. The following example shows a query of this DMV:

Select *
from sys.dm_os_tasks

The task is the container for the work that’s being done, but if you look into sys.dm_os_tasks, there is no indication of exactly what work that is. Figuring out what each task is doing takes a little more digging. First, dig out the request_id. This is the key into the DMV sys.dm_exec_requests. Within sys.dm_exec_requests you will find some familiar fields — namely, sql_handle, along with statement_start_offset, statement_end_offset, and plan_handle. You can take either sql_handle or plan_handle and feed them into sys.dm_exec_sql_text (plan_handle | sql_handle) and get back the original T-SQL that is being executed:

Select t.task_address, s.text
From sys.dm_os_tasks as t inner join sys.dm_exec_requests as r 
on t.task_address = r.task_address
Cross apply sys.dm_exec_sql_text (r.plan_handle) as s
where r.plan_handle is not null

Workers

A worker is where the work is actually done, and the work it does is contained within the task. Workers can be seen using the DMV sys.dm_os_workers:

Select *
From sys.dm_os_workers

Some of the more interesting columns in this DMV are as follows:

Task_address — Enables you to join back to the task, and from there back to the actual request, and get the text that is being executed
State — Shows the current state of the worker
Last_wait_type — Shows the last wait type that this worker was waiting on
Scheduler_address — Joins back to sys.dm_os_schedulers

Threads

To complete the picture, SQLOS also contains objects for the operating system threads it is using. OS threads can be seen in the DMV sys.dm_os_threads:

Select *
From sys.dm_os_threads

Interesting columns in this DMV include the following:

Scheduler_address — Address of the scheduler with which the thread is associated
Worker_address — Address of the worker currently associated with the thread
Kernel_time — Amount of kernel time that the thread has used since it was started
Usermode_time — Amount of user time that the thread has used since it was started

Scheduling

Now that you have seen all the objects that SQLOS uses to manage scheduling, and understand how to examine what’s going on within these structures, it’s time to look at how SQL OS actually schedules work.

One of the main things to understand about scheduling within SQL Server is that it uses a nonpreemptive scheduling model, unless the task being run is not SQL Server code. In that case, SQL Server marks the task to indicate that it needs to be scheduled preemptively. An example of code that might be marked to be scheduled preemptively would be any code that wasn’t written by SQL Server that is run inside the SQL Server process, so this would apply to any CLR code.

PREEMPTIVE VS. NONPREEMPTIVE SCHEDULING

With preemptive scheduling, the scheduling code manages how long the code can run before interrupting it, giving some other task a chance to run.

The advantage of preemptive scheduling is that the developer doesn’t need to think about yielding; the scheduler takes care of it. The disadvantage is that the code can be interrupted and prevented from running at any arbitrary point, which may result in the task running more slowly than possible. In addition, providing an environment that offers preemptive scheduling features requires a lot of work.

With nonpreemptive scheduling, the code that’s being run is written to yield control at key points. At these yield points, the scheduler can determine whether a different task should be run.

The advantage of nonpreemptive scheduling is that the code running can best determine when it should be interrupted. The disadvantage is that if the developer doesn’t yield at the appropriate points, then the task may run for an excessive amount of time, retaining control of a CPU when it’s waiting. In this case, the task blocks other tasks from running, wasting CPU resources.

SQL Server begins to schedule a task when a new request is received, after the Query Optimizer has completed its work to find the best plan. A task object is created for this user request, and the scheduling starts from there.

The newly created task object has to be associated with a free worker in order to actually do anything. When the worker is associated with the new task, the worker’s status is set to init. When the initial setup has been done, the status changes to runnable. At this point, the worker is ready to go but there is no free scheduler to allow this worker to run. The worker state remains as runnable until a scheduler is available. When the scheduler is available, the worker is associated with that scheduler, and the status changes to running. It remains running until either it is done or it releases control while it waits for something to be done. When it releases control of the scheduler, its state moves to suspended (the reason it released control is logged as a wait_type. When the item it was waiting on is available again, the status of the worker is changed to runnable. Now it’s back to waiting for a free scheduler again, and the cycle repeats until the task is complete.

At that point, the task is released, the worker is released, and the scheduler is available to be associated with the next worker that needs to run. The state diagram for scheduling workers is shown in Figure 6.

FIGURE 6

Others