Wait Types and correlation to other Performance info
Wait Type
|
Category
|
Description
|
Correlation to
Other info
|
ASYNC_DISKPOOL_LOCK
|
IO
(Restore DB)
|
RARE
During Backup and Restore (e.g. including zeroing out pages) threads written
in parallel.
|
Possible
disk bottleneck. See disk perf
counters for confirmation.
|
ASYNC_IO_COMPLETION
|
IO
|
Waiting
for asynchronous IO requests to complete.
Identify
disk bottlenecks, using PERF Counters, Profiler,
::fn_virtualfilestats
and SHOWPLAN
Any
of the following will reduce these waits:
1.
Adding additional IO bandwidth,
2.
Balancing IO across other drives
3.
Reducing IO with proper indexing
4.
Check for bad query plans
5.
Check for memory pressure
|
See
PERFMON Disk perf counters:
1.
Disk sec/read
2.
Disk sec/write
3.
Disk queues
See
PERFMON SQL Buffer Cache perf counters for memory pressure:
1.
Page Life Expectancy
2.
Checkpoint pages/sec
3.
Lazywrites/sec
See
PERFMON SQL Access Methods for correct
indexing:
1.
Full Scans/sec
2.
Index seeks/sec
Check
IoStallMS – IoStallMS is the number of cumulative milliseconds of IO waits
for a particular file. If IoStallMS is
inordinately high for one or more files, you have a disk bottleneck.
1.
select * from ::fn_virtualfilestats (dbid,file#)
2.
select * from ::fn_virtualfilestats (dbid,-1) to list
all files for a database.
SQL Profiler can be used to identify which TSQL statements do
scans. Select the scans event class & events scan:started and scan:completed.
Include the object Id data
column. Save the profiler trace to a
trace table, and then search for the scans
event. The scan:completed event will provide associated IO so you can also
search for high reads, writes, and duration.
Check
SHOWPLAN for bad query plans
|
CMEMTHREAD
|
|
Waiting
for thread safe memory objects
|
|
CURSOR
|
|
Asynch
Cursor thread
|
|
CXPACKET
|
|
Parallel
process waits. Possible skew of data
possible lock of a range for this cpu meaning one parallel process is behind,
etc.
In
an OLTP environment, excessive CXPACKET waits can impact the throughput of
other OLTP traffic.
In
a DW environment, CXPACKET waits are expected for multiple proc environments.
|
Check
for parallelism – sp_Configure “max degree of parallelism”.
If
max degree of parallelism = 0, you may want to do one of the following:
1.
turn off parallelism entirely: set max degree of parallelism to 1
2.
limit parallelism by setting max degree of parallelism to some number less
than the total number of CPUs. For
example if you have 8 procs, set max degree of parallelism to <=4.
|
DBTABLE
|
|
New
Checkpoint request that is waiting for outstanding checkpoint request to
complete
|
See
SQL Buffer Cache perf counters:
1.
Page Life Expectancy
2.
Checkpoint pages/sec
3.
Lazywrites/sec
|
DTC
|
|
Waiting
for Distributed Transaction Coordinator
|
Check
transaction isolation level
|
EC
|
|
Non-parallel
synchronization between parent and child thread
|
|
EXCHANGE
|
|
Waiting
on a parallel process to complete, shutdown or startup.
|
Check
for parallelism – sp_Configure “max degree of parallelism”.
If
max degree of parallelism = 0, you may want to do one of the following:
1.
turn off parallelism entirely: set max degree of parallelism to 1
2.
limit parallelism by setting max degree of parallelism to some number less
than the total number of CPUs. For
example if you have 8 procs, set max degree of parallelism to <=4.
|
EXECSYNC
|
|
Query
memory and spooling to disk
|
|
IO_COMPLETION
|
IO
|
Waiting
for IO requests to complete.
Identify
disk bottlenecks, using PERF Counters, Profiler,
::fn_virtualfilestats
and SHOWPLAN
Any
of the following will reduce these waits:
1.
Adding additional IO bandwidth,
2.
Balancing IO across other drives
3.
Reducing IO with proper indexing
4.
Check for bad query plans
|
See
Disk perf counters:
1.
Disk sec/read
2.
Disk sec/write
3.
Disk queues
See
SQL Buffer Cache perf counters:
1.
Page Life Expectancy
2.
Checkpoint pages/sec
3.
Lazywrites/sec
See
SQL Access Methods for correct
indexing:
1.
Full Scans/sec
2.
Index seeks/sec
See
memory perf counter
1.
Page faults/sec
Check
IoStallMS
1.
select * from ::fn_virtualfilestats(dbid,file#)
SQL Profiler can be used to identify which TSQL statements do
scan. Select the scans event class & events scan:started and scan:completed.
Include the object Id data
column. Save the profiler trace to a
trace table, and then search for the scans
event. The scan:completed event will provide associated IO so you can also
search for high reads, writes, and duration.
Check
SHOWPLAN for bad query plans
|
LATCH_x
|
Latch
|
Latches
are short term light weight synchronization objects. Latches are not held for the duration of a
transaction.
“Plain” latches are generally not related to IO. These
latches can be used for a variety of things, but they are not used to
synchronize access to buffer pages (PAGELATCH_x is used for that).
Possibly the most common case is contention on internal caches
(not the buffer pool pages), especially when using heaps and/or text.
|
If
high, check PERFMON for
1.
memory pressure
2.
SQL Latch waits (ms)
Look
for LOG and Pagelatch_UP wait types.
Latch_x waits can often be alleviated by solving LOG and
PAGELATCH_UP contention. In the absence of LOG and/or PAGELATCH_UP
contention, the only other option is to partition the table/index in question
in order to create multiple caches (the caches are per-index).
|
LATCH_DT
|
Latch
|
Destroy
Latch
|
See
LATCH_x
|
LATCH_EX
|
Latch
|
Exclusive
Latch
|
See
LATCH_x
|
LATCH_KP
|
Latch
|
Keep
Latch
|
See
LATCH_x
|
LATCH_NL
|
Latch
|
Null
Latch
|
See
LATCH_x
|
LATCH_SH
|
Latch
|
Shared
Latch
|
See
LATCH_x
|
LATCH_UP
|
Latch
|
Update
Latch
|
See
LATCH_x
|
LCK_x
|
Lock
|
Possible
transaction management issue.
1.
For shared locks, check Isolation level for transaction.
2.
Keep transaction as short as possible
|
See
SQL Locks perf counters
1.Lock
wait time (ms)
Hint:
check for memory pressure, which causes more physical IO, thus prolonging the
duration of transactions and locks.
|
LCK_M_BU
|
Lock
|
Bulk
update lock
|
See
Lck_x
|
LCK_M_IS
|
Lock
|
Intent
Share lock
|
See
Lck_x
|
LCK_M_IU
|
Lock
|
Intent
Update lock
|
See
Lck_x
|
LCK_M_IX
|
Lock
|
Intent
Exclusive lock
|
See
Lck_x
|
LCK_M_RIn_NL
|
Lock
|
Range
Intent Null Lock
|
See
Lck_x
|
LCK_M_RIn_S
|
Lock
|
Range
Intent Shared lock
|
See
Lck_x
|
LCK_M_RIn_U
|
Lock
|
Range
Intent Update lock
|
See
Lck_x
|
LCK_M_RIn_X
|
Lock
|
Range
Intent Exclusive lock
|
See
Lck_x
|
LCK_M_RS_S
|
Lock
|
Range
Shared Shared (Key-Range) lock
|
See
Lck_x
|
LCK_M_RS_U
|
Lock
|
Range
Shared Update (key-range) lock
|
See
Lck_x
|
LCK_M_RX_S
|
Lock
|
Range
Exclusive shared (key-range)
|
See
Lck_x
|
LCK_M_RX_U
|
Lock
|
Range
Exclusive update (key-range) lock
|
See
Lck_x
|
LCK_M_RX_X
|
Lock
|
Range
Exclusive Exclusive (key-range)
|
See
Lck_x
|
LCK_M_S
|
Lock
|
Shared
Lock:
|
See
Lck_x
|
LCK_M_SCH_M
|
Lock
|
Modify
schema lock:
|
See
Lck_x
|
LCK_M_SCH_S
|
Lock
|
Shared
Schema (Stability) lock
|
See
Lck_x
|
LCK_M_SIU
|
Lock
|
Share
Intent Update lock
|
See
Lck_x
|
LCK_M_SIX
|
Lock
|
Share
Intent Exclusive lock
|
See
Lck_x
|
LCK_M_U
|
Lock
|
Update
lock.
|
See
Lck_x
|
LCK_M_UIX
|
Lock
|
Update
intent exclusive lock
|
See
Lck_x
|
LCK_M_X
|
Lock
|
Exclusive
lock
|
See
Lck_x
|
LOGMGR
|
Transaction
Log
|
Waiting
for write requests to the transaction log to complete.
Identify
disk bottlenecks, using PERF Counters, Profiler,
::fn_virtualfilestats
and SHOWPLAN
Any
of the following will reduce these waits:
1.
Adding additional IO bandwidth,
2.
Balancing IO across other drives
3.
Moving / Isolating the transaction log on its own drive
|
See
Disk perf counters:
1.
Disk sec/read
2.
Disk sec/write
3.
Disk queues
See
SQL Buffer Cache perf counters:
1.
Page Life Expectancy
2.
Checkpoint pages/sec
3.
Lazywrites/sec
Check
IoStallMS for tranlog
1.
select * from ::fn_virtualfilestats(dbid,file#)
|
MISCELLANEOUS
|
|
Catch
all wait type
|
|
NETWORKIO
|
|
Waiting
on Network IO Completion. Waiting to
read or write to a client on the network
This
can occur if a client is in the middle of sending packets to SQL Server, or
when SQL writes data to a client and is waiting for an ACK.
|
Check
NIC bandwidth. 100mbits is preferable
to 10mbs.
|
OLEDB
|
|
OLEDB
waits. Common causes are:
·
SQL Server is waiting for client application to send
data. Some examples include:
·
1. BULK INSERT
·
2. CONVERT (6.5 to 2000)
·
3. Full text
·
4. Linked server calls incl. four part name calls,
remote procedure calls, openquery, openrowset etc.
·
5. Queries that access virtual tables, since these are
implemented as OLEDB rowset providers.
|
1.
Check placement of client app including any file input read by the client and
SQL Server data and log files. See
PERFMON disk secs/read & disk secs/write.
If disk secs/read are high, you may
add additional IO bandwidth, balance IO across other drives, or move /
isolate the database and transaction log on its own drives
2.
Inspect TSQL code for RPC, Distributed (Linked Server) & Full Text
Search. While SQL server supports
these type queries, they are sometimes performance bottlenecks.
3.
To get the SQL Statement involved in OLEDB waits, Select virtual table
master..sysprocesses as follows:
a. SQL2000 Service Pack 3 Only
DECLARE @Handle binary(20)
SELECT @Handle = sql_handle FROM sysprocesses
WHERE waittype = 0x0042
SELECT * FROM ::fn_get_sql(@Handle)
b. SQL2000 RTM, SP1, SP2 – limited to 255 characters
dbcc inputbuffer (spid)
|
PAGEIOLATCH_x
|
|
Latches
are short term synchronization objects.
used to synchronize access to buffer pages. PageIOLatch is used for disk to memory
transfers.
|
If
this is significant in percentage, it normally suggests disk IO subsystem
issues. Check disk counters.
|
PAGEIOLATCH_DT
|
|
IO
Page destroy latch
|
See
PAGEIOLATCH_x
|
PAGEIOLATCH_EX
|
|
IO
Page latch exclusive
|
See
PAGEIOLATCH_x
|
PAGEIOLATCH_KP
|
|
IO
Page latch keep
|
See
PAGEIOLATCH_x
|
PAGEIOLATCH_NL
|
|
IO
Page latch null
|
See
PAGEIOLATCH_x
|
PAGEIOLATCH_SH
|
|
IO
Page latch shared
|
See
PAGEIOLATCH_x
|
PAGEIOLATCH_UP
|
|
IO
Page latch update
|
See
PAGEIOLATCH_x
|
PAGELATCH_x
|
|
Latches
are short term light weight synchronization objects. Latches are not held for the duration of a
transaction. Typical latching operations during row transfers to memory,
controlling modifications to row offset table, etc. Consequently, the duration of latches is
normally sensitive to available memory.
|
If
this is significant in percentage, it normally indicates cache contention.
|
PAGELATCH_DT
|
|
Page
latch
|
See
PAGELATCH_x
|
PAGELATCH_EX
|
|
Page
latch exclusive
Contention can be caused by issues other than IO or memory
performance, for example, heavy concurrent inserts into the same index range
can cause this type of contention. If a lot of inserts need to be
placed on the same page they are serialized using the latch. A lot of
inserts into the same range can also cause page splits in the index which
will hold onto the latch while allocating a new page (this can take a while).
Any read accesses to the same range as the inserts would also conflict
on the latches. The solution in these cases is to distribute the
inserts using a more appropriate
|
See
PAGELATCH_x
|
PAGELATCH_KP
|
|
Page
latch keep
|
See
PAGELATCH_x
|
PAGELATCH_NL
|
|
Page
latch null
|
See
PAGELATCH_x
|
PAGELATCH_SH
|
|
Page
latch shared
Contention can be caused by issues other than IO or memory
performance, for example, heavy concurrent inserts into the same index range
can cause this type of contention. If a lot of inserts need to be
placed on the same page they are serialized using the latch. A lot of
inserts into the same range can also cause page splits in the index which
will hold onto the latch while allocating a new page (this can take a while).
Any read accesses to the same range as the inserts would also conflict
on the latches. The solution in these cases is to distribute the
inserts using a more appropriate
|
See
PAGELATCH_x
|
PAGELATCH_UP
|
|
Page latch Update is used only for allocation related pages,
and contention on it is often a sign that more files are needed. With
multiple files, allocations can be distributed across multiple files thus
reducing demand on the per-file data structures stored on these pages. The
contention is not IO performance, but rather internal allocation contention to
access the pages – adding more spindles to a file or moving the file to a
faster disk will not help, nor will
adding more memory.
|
See
PAGELATCH_x
|
PAGESUPP
|
|
Waits
for parallel page supplier. Possible
disk bottleneck
Any
of the following will reduce these waits:
1.
Adding additional IO bandwidth,
2.
Balancing IO across other drives
3.
Reducing IO with proper indexing
4.
Check for bad query plans
|
See
Disk perf counters:
1.
Disk sec/read
2.
Disk sec/write
3.
Disk queues
See
SQL Buffer Cache perf counters:
1.
Page Life Expectancy
2.
Checkpoint pages/sec
3.
Lazywrites/sec
Check
IoStallMS for database
1.
select * from ::fn_virtualfilestats(dbid,file#)
|
PIPELINE_INDEX_STAT
|
|
PIPELINE
waittypes added to allow one user to perform multiple operations such as
writes to log cache on behalf of himself as well as other users who are
waiting for same operation. It does
all log writes in single operation.
|
See
Disk perf counters:
1.
Disk sec/read
2.
Disk sec/write
3.
Disk queues
See
SQL Buffer Cache perf counters:
1.
Page Life Expectancy
2.
Checkpoint pages/sec
3.
Lazywrites/sec
Check
IoStallMS for database
1.
select * from ::fn_virtualfilestats(dbid,file#)
|
PIPELINE_LOG
|
|
PIPELINE
waittypes added to allow one user to perform multiple operations such as
writes to log cache on behalf of himself as well as other users who are
waiting for same operation. Does in
single operation.
|
See
Disk perf counters:
1.
Disk sec/read
2.
Disk sec/write
3.
Disk queues
See
SQL Buffer Cache perf counters:
1.
Page Life Expectancy
2.
Checkpoint pages/sec
3.
Lazywrites/sec
Check
IoStallMS for database
1.
select * from ::fn_virtualfilestats(dbid,file#)
|
PIPELINE_VLM
|
|
PIPELINE
waittypes added to allow one user to perform multiple operations such as
writes to log cache on behalf of himself as well as other users who are
waiting for same operation. Does in
single operation.
|
See
Disk perf counters:
1.
Disk sec/read
2.
Disk sec/write
3.
Disk queues
See
SQL Buffer Cache perf counters:
1.
Page Life Expectancy
2.
Checkpoint pages/sec
3.
Lazywrites/sec
Check
IoStallMS for database
1.
select * from ::fn_virtualfilestats(dbid,file#)
|
PSS_CHILD
|
|
Waiting
on Asynch thread
|
|
RESOURCE_QUEUE
|
|
Internal
Use only
|
Synchronization
object
|
RESOURCE_SEMAPHORE
|
|
COMMON
for DSS like workload & large queries such as hash joins; must wait for
memory quota (grant) prior to execution.
|
See
SQL Memory Mgr perf counters
1.
Memory Grants Pending
2.
Memory Grants Outstanding
|
SHUTDOWN
|
|
Shutdown
without specifying NOWAIT, waits for other users to logout before shutdown completes
|
Monitory
SQL Statistics: User Connections
To
expedite shutdown you can:
1.
SHUTDOWN WITH NOWAIT
2.
Use SQL Kill command to terminate user connections.
|
SLEEP
|
|
Internal
Use only
|
|
TEMPOBJ
|
|
Dropping
a global temp object that is being used by others.
|
|
TRAN_MARK_DT
|
|
Transaction
latch - destroy
|
|
TRAN_MARK_EX
|
|
Transaction
latch - Exclusive
|
|
TRAN_MARK_KP
|
|
Transaction
latch - Keep page
|
|
TRAN_MARK_NL
|
|
Transaction
latch - Null
|
|
TRAN_MARK_SH
|
|
Transaction
latch - Shared
|
|
TRAN_MARK_UP
|
|
Transaction
latch - Update
|
|
UMS_THREAD
|
|
Batch
waiting on a worker thread to free up (or batch waiting to get a worker
thread to run it)
|
If
this is a high percentage, you can increase the number of worker threads from
the default of 255. The maximum is
1024.
|
WAITFOR
|
Waitfor
|
Check
for waitfor delay in TSQL code
|
Inspect
TSQL code for “waitfor delay” statement
|
WRITELOG
|
Transaction
Log
|
Waiting
for write requests to the transaction log to complete.
Identify
disk bottlenecks, using PERF Counters, Profiler,
::fn_virtualfilestats
and SHOWPLAN
Any
of the following will reduce these waits:
1.
Adding additional IO bandwidth,
2.
Balancing IO across other drives
3.
Moving / Isolating the transaction log on its own drive
|
See
Disk perf counters:
1.
Disk sec/read
2.
Disk sec/write
3.
Disk queues
See
SQL Buffer Cache counters:
1.
Page Life Expectancy
2.
Checkpoint pages/sec
3.
Lazywrites/sec
Check
IoStallMS for tranlog
1.
select * from ::fn_virtualfilestats(dbid,file#)
|
XACTLOCKINFO
|
|
Transaction
escalation, rollback
|
|
QUEUES (Perfmon Counters)
The
Queues aspect of the Waits and Queues approach to performance analysis refers
to PERFMON counters. PERFMON counters
provide a view of system performance from a resource standpoint.
PERFMON Counters, correlation, possible conclusions
& actions
Resource
Component
|
Perfmon Object
|
Counters to Monitor
|
Description
|
Possible conclusions / actions
|
||
Disk
|
Physical Disk
|
Current Queue Length
|
Sustained high queues mean your
IO subsystem is not keeping up.
|
Confirm IO issues with disk
sec/read & disk sec/write.
Waitstats correlation:
1. IO_COMPLETION
2. ASYNC_IO_COMPLETION
3. WRITELOG
4. LOGMGR
|
||
Avg. Disk Queue Length
|
Average of disk queues over
time. If this number is consistently
high, disk sec/read and disk sec/write will be high as well indicating IO
bandwidth issues.
|
Confirm IO issues with disk
sec/read and disk sec/write.
Waitstats correlation:
1. IO_COMPLETION
2. ASYNC_IO_COMPLETION
3. WRITELOG
4. LOGMGR
|
||||
Disk Sec/Read
|
Under normal circumstances,
reads should take 4-8ms – confirm with hardware vendor for exact read
time. Sustained queues will skew this
number higher because disk sec/read factors in the effects of disk
queues. High numbers mean your IO
subsystem is not keeping up with requests
Check individual drive
performance if there are multiple drives.
If it is a broad problem affecting all drives, the IO subsystem is not
keeping up. More drives could be
beneficial. If there is ONE very hot
drive, then look at disk activity such as location of paging file, database,
transaction log, and other read/write activity.
|
If disk sec/read > normal
read time (ask vendor for normal read time) you can consider the following
options:
1. Resolve IO bottleneck by adding more drives; spreading IO across new
drives if possible e.g. move files such as database, transaction log,
other application files that are being written to or read from.
2. Check for memory pressure – see memory component.
3. Check for proper indexing of SQL tables. Proper indexing can save IO. Check SQL query plans looking for scans and
sorts, etc. Showplan identifies
sorting steps.
4. Run SQL Profiler to identify TSQL statements doing scans. In Profiler, select the scans event class & scan stopped event. Go to the data column tab and add object Id. Run the trace. Save the profiler trace to a trace table,
and then search for the scans
event. Alternately, you can search for
high duration, reads, and writes.
Waitstats correlation:
1. IO_COMPLETION
2. ASYNC_IO_COMPLETION
3. WRITELOG
4. LOGMGR
|
||||
Disk Sec/Write
|
Under normal circumstances,
reads should take 4-8ms - confirm with hardware vendor. Sustained queues will skew this disk
sec/write higher because this counter factors in the effects of disk queues. High numbers mean your IO subsystem is not
keeping up with requests. In some SAN
environments, writes can be as low as 1-2ms.
|
See disk sec/read.
High performance (significant
insert, update, and delete activity) requires the transaction log to be on a
separate drive from the database.
Waitstats correlation:
1. IO_COMPLETION
2. ASYNC_IO_COMPLETION
3. WRITELOG
4. LOGMGR
|
||||
Memory / Cache
|
Memory
|
Page Faults/sec
|
This counter includes both hard
faults (those that require disk access) and soft faults (where the faulted
page is found elsewhere in physical memory.) Most processors can handle large
numbers of soft faults without significant consequence. However, hard faults,
which require disk access, can cause significant delays. See the disk component for more information.
|
Check
for memory pressure (see SQL Server buffer manager), low data page hit rates,
& memory grants pending.
|
||
Pages/sec
|
Number of pages read from or
written to disk to resolve hard page faults.
These are hard faults that
require physical IO to fetch the page.
|
Compare with Page Faults/sec.
Check for memory pressure (see
SQL Server buffer manager), low data page hit rates, & memory grants
pending.
|
||||
CPU
|
Processor
|
%User Time
|
SQL Server runs in User
mode. Privileged mode, is designed for
operating system components and allows direct access to hardware and all
memory.
|
Make sure % user time > 70%.
Check task manager (taskmgr.exe) to see how much CPU sqlserver.exe is
getting. If user time < 70%, check on %Processor
Time & % Privileged
activity.
|
||
% Privileged Time
|
The operating system switches application threads to
privileged mode to access operating system services
|
Should be < 20%. Check task manager (taskmgr.exe) to
see how much CPU sqlserver.exe is getting. If %privileged time > 20%, check on %Processor Time & %
User Time.
|
||||
%Processor Time
|
% of time CPU is executing over
sample interval.
|
Common uses of CPU resources:
1. Compilation and re-compilation use CPU resources. Plan
re-use and parameterization minimizes CPU consumption due to compilation.
For more details on compilation, recompilation, parameterization and plan
re-use, see http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnsql2k/html/sql_queryrecompilation.asp
--Plan re-use is where usecounts
are > 1
select dbid, objid, cacheobjtype, objtype, usecounts, sql
from master..syscacheobjects
order by dbid,objid, cacheobjtype,objtype,usecounts,sql
Correlate to PERFMON counters:
1. System: Processor Queue
length
2. SQL Statistics:
Compilations/sec
3. SQL Statistics:
re-Compilations/sec
4. SQL Statistics: Requests/sec
If both of the following are
true, you are cpu bound:
1. Proc time > 85% on average
2. Context switches (see system object) > 20K / sec
light weight pooling can provide
a 15% boost. Lightweight pooling (also
known as fiber mode) divides a thread into 10 fibers. Overhead per fiber is less than that of
individual threads.
|
||||
%Idle Time
|
% of time CPU is idle over
sample interval
|
|
||||
Interrupts/sec
|
Interrupts/sec is the average
rate, in incidents per second, at which the processor received and serviced
hardware interrupts.
|
Correlate with other perfmon
counters such as IO, Network.
|
||||
Thread
|
Process
|
Page Faults
|
This counter includes both hard
faults (those that require disk access) and soft faults (where the faulted
page is found elsewhere in physical memory.) Most processors can handle large
numbers of soft faults without significant consequence. However, hard faults,
which require disk access, can cause significant delays. See the disk component for more information.
|
Check for memory pressure (see
SQL Server buffer manager), low data page hit rates, & memory grants
pending, page life expectancy.
|
||
System
|
Usage
|
Processor Queue Length
|
|
Number of threads waiting to be
scheduled for CPU time. Some common
uses of CPU resources that may be avoidable:
1. Unnecessary compilation and
recompilation. Parameterization and
plan re-use would reduce CPU consumption.
See http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnsql2k/html/sql_queryrecompilation.asp
2. memory pressure
3. lack of proper indexing
|
||
Context Switches/sec
|
|
|
||||
SQL Server
|
Access Method
|
Forwarded Records/sec
|
Number of records fetched
through forwarded record pointers.
Tables with NO clustered
index. If you start out with a short
row, and update the row creating a wider row, the row may no longer fit on
the data page. A pointer will be put
in its place and the row will be forwarded to another page.
|
Look at code to determine where
the short row is inserted followed by an update.
Can be avoided by:
1. Using Default values (so that
an update will not result in a longer row that is the root cause of forwarded
records).
2. Using Char instead of varchar
(fixes length so that an update will not result in a longer row
|
||
Full Scan/sec
|
Entire table or index is
scanned. Scans can cause excessive IO
if an index would be beneficial.
|
SQL Profiler can be used to
identify which TSQL statements do scan.
Select the scans event
class & events scan:started
and scan:completed. Include the object Id data column. Save the profiler trace to a trace table,
and then search for the scans
event.
The scan:completed event will provide associated IO so you can also
search for high reads, writes, and duration.
|
||||
Index Searches/sec
|
Number of index searches. Index
searches are used to start range scans, single index record fetches, and to
reposition within an index.
|
Compare to Full Scan/sec. You want to see high values for index
searches.
|
||||
Page Splits/sec
|
Number of page splits occurring
as the result of index pages overflowing.
Normally associated with leaf pages of clustered indexes and
non-clustered indexes.
|
Page splits are extra IO
overhead that results from random inserts.
When there is no room on a data
page, and the row must be inserted on the page (due to index order), SQL will
split the page moving half the rows to a new page, and then insert the new
row.
Correlate to Disk: page
sec/write. If this is very high, you
may reorg the index(es) on the table(s) causing the page splits, to reduce
page splits temporarily. Fillfactor
will leave a certain amount of space available for inserts.
|
||||
Memory Mgr
|
Memory Grants Pending
|
Memory resources are required
for each user request. If sufficient
memory is not available, the user will wait until there is enough memory for
the query to run.
|
Compare with Memory grants
outstanding. If grants pending
increases, you can do the following:
1. add more memory to SQL Server
2. add more physical memory to
the box.
3. check for memory pressure –
see & correct indexing if you
experience “out of memory” conditions.
Correlate to Waittype
1. RESOURCE_SEMAPHORE
|
|||
Buffer Manager
|
Buffer cache hit ratio
|
Percentage of time the pages
requested are already in cache
|
Check for memory pressure. See
Checkpoint pages/sec, Lazywrites/sec and Page life expectancy.
|
|||
Checkpoint pages/sec
|
Pages written to disk during the
checkpoint process, freeing up SQL cache
|
Memory pressure is indicated if
this counter is high along with high
lazy writes/sec and low page life expectancy (<300 seconds)
|
||||
Lazy writes/sec
|
Pages written to disk by the
lazywriter, freeing up SQL cache
|
Memory pressure is indicated if
this counter is high along with high
lazy writes/sec and low page life expectancy (<300 seconds)
|
||||
Page life expectancy
|
Time in seconds the data pages,
on average, stay in SQL cache. Low
page life < 300 may indicate (1) SQL cache is cold, (2) memory problems or
(3) missing indexes. Correlate to
Lazywrites/sec and Checkpoint pages/sec
|
Memory pressure is indicated if
this counter is low (<300) along with high
lazy writes/sec and checkpoint
pages/sec.
Check for missing indexes and
bad query plans (scans in profiler)
Check for high page faults/sec.
|
||||
Readahead pages/sec
|
If memory shortages, cold cache,
or low hit rates, SQL may use worker threads to readahead (bring in pages
ahead of time) to raise hit rates. By
itself readahead is not a problem unless users are flushing each other’s
pages consistently.
|
Correlate to counters for SQL
buffer mgr: buffer cache hit ratio, page life expectancy, lazywrites, and
checkpoint pages for memory pressure.
Check for proper indexing and
bad query plans (scans in profiler)
|
||||
Cache Manager
|
Cache Hit Ratio
|
Percentage of time the procedure
plan pages are already in cache e.g. procedure cache hits.
I.e. how often a compiled procedure is found in the procedure cache
(thus avoiding the need to recompile).
|
Check for memory pressure. See
Checkpoint pages/sec, Lazywrites/sec and Page life expectancy.
See SQL Profiler: Stored
Procedure: CacheHit, CacheMiss, and CacheInsert to see what stored procedure
query plans are already in cache (Hit), vs. those not in cache (Miss,Insert)
Check for appropriate plan
re-use in the usecounts column of master..syscacheobjects. It is often desirable for query plans to be
re-used for similar SQL although not always.
Select cacheobjtype, objtype, dbid, sql, usecounts
From master..syscacheobjects
See SQL Statistics:
Compilations/sec for discussion of plan reuse.
If there is memory pressure,
plans will be discarded to make room for other data and/or procedure plans.
|
|||
Databases
|
Log Flush Wait Time
|
Waiting for transaction log
writes (ms)
|
See disk perf counters
Check transaction log file
::fn_virtualfilestats(dbid, file#) for IOStallMS (waits in ms)
|
|||
Log Flush Waits/sec
|
Tranlog writes per second
|
See disk perf counters,
::fn_virtualfilestats for IOStallMS.
|
||||
Log Growths
|
Windows will automatically grow
transaction log to accommodate insert, update, and delete activity.
|
In general, growths of the
transaction log will temporarily freeze writes to the transaction log while
Windows grows the transaction log file.
Check to see that the growth increment is large enough. If not, performance will suffer as log
growths will occur more often.
|
||||
Transactions /sec
|
SQL Server transactions per
second
|
|
||||
General Statistics
|
Logins/sec
|
Number of logins per second
|
User connections
|
|||
Logout/sec
|
Number of logouts per second
|
|
||||
User connections
|
Number of user connections
|
|
||||
Latches
|
Average Latch Wait Time(ms)
|
Latches
are short term light weight synchronization object. Latches are not held for the duration of a
transaction. Typical latching operations during row transfers to memory,
controlling modifications to row offset table, etc.
|
If
high, check PERFMON DISK and MEMORY objects for
1.
IO bottlenecks
2.
memory pressure
Normally
reduced with more memory and/or IO capacity
|
|||
Latch Waits/sec
|
See above
|
|
||||
Total Latch Wait Time(ms)
|
Short
term light weight synchronization object.
Latches are not held for the duration of a transaction. Typical
latching operations during row transfers to memory, controlling modifications
to row offset table, etc.
|
If
high, check PERFMON DISK and MEMORY objects for
1.
IO bottlenecks
2.
memory pressure
Normally
reduced with more memory and/or IO capacity
|
||||
Locks
|
Average Wait Time(ms)
|
Transactions
should be as short as possible to limit the blocking of other users.
|
Hint:
check for memory pressure, which causes more physical IO, thus prolonging the
duration of transactions and locks.
|
|||
Lock Wait Time(ms)
|
Transactions
should be as short as possible to limit the blocking of other users.
|
Hint:
check for memory pressure, which causes more physical IO, thus prolonging the
duration of transactions and locks
|
||||
Lock Waits/sec
|
Transactions
should be as short as possible to limit the blocking of other users.
|
Hint:
check for memory pressure, which causes more physical IO, thus prolonging the
duration of transactions and locks
|
||||
SQL
Statistics
|
Compilations/sec
|
Includes initial compile and
subsequent re-compiles. Compilation
and re-compilation are CPU intensive operations.
Unnecessary compilation can
sometimes be avoided with query plan re-use.
Query plan re-use can be seen in the usecounts column of syscacheobjects
as follows
Select cacheobjtype, objtype, dbid, sql, usecounts
From master..syscacheobjects
Parameterization is important
for plan re-use. In addition, some
types of re-compilation can be avoided.
See the SQL Server 2000 recompilation paper for more info: http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnsql2k/html/sql_queryrecompilation.asp
To get initial compilations
ONLY, you must subtract recompilations/sec from compilations/sec.
Compare to batch requests/sec to
see extent of compilation.
|
||||
|
Recompilations/sec
|
Only contains re-compiles. SQL Profiler can provide information on
what procs are recompiling, what statement, and the reason for
recompilation. In Profiler, select the
stored procedure event class &
SP:recompilation event, and include
the data column eventsubclass. Review the trace searching for
eventsubclass values 1 through 6. The
statements preceding caused the recompilation. For more details on recompilation, see http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnsql2k/html/sql_queryrecompilation.asp
|
||||
Batch Requests/sec
|
Total batch requests should be
compared with compilations/sec
|
|||||
Auto-Param Attempts/sec
|
Auto-param attempts should be
compared to failed auto-params/sec.
Proper parameterization is important for plan re-use. In some cases, Sp_executeSQL could be used
with adhoc SQL. For more details on
recompilation, see http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnsql2k/html/sql_queryrecompilation.asp
|
|||||
Failed Auto-Params/sec
|
Auto-param attempts should be
compared to failed auto-params/sec.
Proper parameterization is important for plan re-use. In some cases, Sp_executeSQL could be used
with adhoc SQL. For more details on
recompilation, see http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnsql2k/html/sql_queryrecompilation.asp
|
|||||
0 comments:
Post a Comment