OraCsdP: UN DO Stress

This monday morning, I received an unpleasant message from a project manager :

« I had the biggest batch program failures in history : Have you done any changes on Production DB ? ».

Like almost every time, DBAs are blamed for wrong reasons !

Starting at 1/30/11 01:50 , errors were related to
ORA-30036: unable to extend segment by 8 in undo tablespace 'UNDO'

Let’s find out what happened in the UNDO tablespace and give some clues about its behaviour :

Oracle 10.2.0.x - Automatic Undo Management

1/ Grid and AWR at time of incident

I first noticed in GRID console that UNDO tablespace grew -Autoextend- from 500 MB to the MAXSIZE 1000 MB.

AWR is more precise : equivalent of V$UNDOSTAT

-- Undo Used Size ( and other measures ) per 10 minutes period from AWR

-- UNDOBLKS -> the consumption rate of undo blocks

with blk_sz_MB as ( SELECT VALUE/1024/1024 blk_val FROM v$parameter WHERE NAME = 'db_block_size' )

select BEGIN_TIME , END_TIME ,

ceil(UNDOBLKS*blk_val) "UNDO Used size (MB)",

ceil(ACTIVEBLKS*blk_val) "UNDO Active size (MB)",

ceil(UNEXPIREDBLKS*blk_val) "UNDO Unexpired size (MB)",

ceil(EXPIREDBLKS*blk_val) "UNDO Expired size (MB)",

TUNED_UNDORETENTION

from dba_hist_undostat , blk_sz_MB

where to_char(BEGIN_TIME,'DD/MM/YYYY HH24:MI:SS') between '31/01/2011 00:30:00' and '31/01/2011 03:00:00'

order by 1;

BEGIN_TIME	END_TIME	UNDO Used size (MB)	UNDO Active size (MB)	UNDO Unexpired size (MB)	UNDO Expired size (MB)	TUNED_ UNDORETENTION
31/01/2011 00:38	31/01/2011 00:48	1	11	144	1	60000
31/01/2011 00:48	31/01/2011 00:58	1	11	147	2	60000
31/01/2011 00:58	31/01/2011 01:08	152	158	141	1	60000
31/01/2011 01:08	31/01/2011 01:18	187	318	161	3	60000
31/01/2011 01:18	31/01/2011 01:28	293	566	193	8	30026
31/01/2011 01:28	31/01/2011 01:38	373	933	54	8	1516
31/01/2011 01:38	31/01/2011 01:48	60	992	8	0	1782
31/01/2011 01:48	31/01/2011 01:58	130	992	8	0	2004
31/01/2011 01:58	31/01/2011 02:08	39	994	7	0	2389
31/01/2011 02:08	31/01/2011 02:18	35	992	8	1	2752
31/01/2011 02:18	31/01/2011 02:28	42	992	8	0	3068
31/01/2011 02:28	31/01/2011 02:38	119	11	987	3	3739
31/01/2011 02:38	31/01/2011 02:48	154	11	986	2	3578
31/01/2011 02:48	31/01/2011 02:58	10	11	917	1	3979
31/01/2011 02:58	31/01/2011 03:08	2	11	917	1	4408

2/ UNDO Management

From this result, we need to understand how AUTO UNDO tablespace is managed :

Rollback Segments may be in 3 different states, because Oracle protects the Consistent Read with the 'undo_retention' parameter, beyond which extents expire.

* dba_undo_extents shows Extent column STATUS

ACTIVE active transactions

EXPIRED commited transactions no more under 'undo_retention' period

UNEXPIRED commited transactions still under 'undo_retention' period

This is the FULL POWER of AUTO Over Manual Rollback Segment : the ability to move Extents between RBS ; EXPIRED or UNEXPIRED Extents may be transferred to ACTIVE transactions if needed.

What is the algorithm used by Oracle when trying to allocate a new extent for SMU in UNDO tablespace ?

[ taken from Internet ]

1. A new extent will be allocated from the undo tablespace.

2. If failed, ( no free space exists in the tablespace )

acquire expired extent from another undo segment.

This involves shrinking from the other segment and adding to the current undo segment.

3. If failed, reclaim expired extents from the current transaction's commited changes.

4. If failed, try allocating more space by datafile extention ( if autoextend ON ).

5. If failed, try reuse an unexpired extent from the current undo segment.

6. If failed, try stealing an unexpired extent from another undo segment.

7. If all failed, report an "Out-Of-Space" error.

As we see, the algorithm prioritizes the consistent read ( unexpired extent ) over the datafile extension.

3/ Going back to the problem

With these findings, I went on searching AWR for Big SQL and found out that a big DELETE took place and overfilled UNDO Segments …

What I learned from this analysis is that the UNDO_RETENTION parameter was set too high ( 60000 sec ) and leads DBF to expand until the MAXSIZE ; then, it was automatically decreased by Oracle for the benefit of Active Transactions.
The 10g Tuned Undo_Retention always tries to be higher than the UNDO_RETENTION init parameter, but may go below this threshold if there are space constraints ; it happens, I believe, between step 5 and 6 of the above agorithm.

Mettalica : Tuned_UndoRetention Can be Less Than Undo_Retention in Init.ora [ID 1100313.1] :

It is true that Tuned_undoretention can be less than undo_retention specified in the parameter file.
Undo_retention specified in the init.ora is a low threshold.

From Oracle 10g onward, undo retention is automatically tuned to reduce the chances of "snapshot too old" errors during long-running queries. In the event of any undo space constraints, the system will prioritize DML operations over undo retention. In such situations, the low threshold may not be achieved and tuned_undoretention can go below undo_retention.

If the undo retention threshold must be guaranteed, even at the expense of DML operations, the RETENTION GUARANTEE clause can be set against the undo tablespace.

4/ Appendix Queries

-- AWR : Undo Used Size ( and other measures ) per 10 minutes period

-- UNDOBLKS -> the consumption rate of undo blocks

-- UNXPBLKREUCNT should be = 0 ( if no space pressure problem )

with blk_sz_MB as ( SELECT VALUE/1024/1024 blk_val FROM v$parameter WHERE NAME = 'db_block_size' )

select BEGIN_TIME , END_TIME ,

txncount "total number of transactions", MAXCONCURRENCY "tx. executed concurrently",

ceil(UNDOBLKS*blk_val) "UNDO Used size (MB)",

ceil(ACTIVEBLKS*blk_val) "UNDO Active size (MB)",

ceil(UNEXPIREDBLKS*blk_val) "UNDO Unexpired size (MB)",

ceil(EXPIREDBLKS*blk_val) "UNDO Expired size (MB)",

ceil(UNXPBLKREUCNT*blk_val) "Unexpired Reused size (MB)",

TUNED_UNDORETENTION

from dba_hist_undostat , blk_sz_MB

where to_char(BEGIN_TIME,'DD/MM/YYYY HH24:MI:SS') between '31/01/2011 00:30:00' and '31/01/2011 03:00:00'

order by 1;

-- Current Extents Status - Instance Level

select STATUS,count(*) from dba_undo_extents group by status ;

-- Current Extents Status & Size per tablespace ( if changing UNDO tbspce for example ) - Instance Level

select tablespace_name,STATUS,count(*),ceil(sum(bytes/1024/1024)) MB_SZ from dba_undo_extents group by tablespace_name,status ;

-- Current Active Extents Size per Rollback Segment - Instance Level

select OWNER,SEGMENT_NAME,BYTES/1024/1024,tablespace_name from dba_undo_extents where STATUS = 'ACTIVE' order by 3 desc ;

-- Current Undo Used Size - Instance Level

select ceil(sum(RSSIZE)/1024/1024) "UNDO Current Used Size (MB)" from v$rollstat a ;

-- Total Tablespace UNDO Available Size - Instance Level

select (sum(c.bytes)/1024/1024) "UNDO Tblspce Total Size (MB)" from dba_tablespaces b , dba_data_files c

where b.tablespace_name = c.tablespace_name and b.contents = 'UNDO' ;

-- UNDO Max Used Size since instance startup - Instance Level

-- ( attention -> sum the HWM of every undo segments, may be > UNDO tablespace size )

select ceil(sum(HWMSIZE)/1024/1024) "UNDO Max Used Size (MB)" from v$rollstat ;

-- Undo Used Size per 10 minutes period since instance startup - Instance Level

-- gives consumption rate of UNDO Blocks, helps in finding proper UNDO_RETENTION according to UNDO tbspce size

select to_char(BEGIN_TIME,'DD/MM/YYYY HH24:MI:SS') begin_time,to_char(END_TIME,'DD/MM/YYYY HH24:MI:SS') end_time,

ceil(UNDOBLKS*(SELECT VALUE FROM v$parameter WHERE NAME = 'db_block_size')/1024/1024) "10 min int. UNDO Used Sz (MB)"

from v$undostat order by BEGIN_TIME ;

-- Current Undo Consumption MB_Size - Session Level

SELECT e.taddr,e.sid,e.serial#,e.program,

ROUND(d.used_ublk * (SELECT VALUE FROM v$parameter WHERE NAME = 'db_block_size')/1024/1024,2) "UNDO Used Size (MB)"

FROM v$transaction d, v$session e WHERE d.addr = e.taddr

ORDER BY 5 DESC;

-- Current UNDO Consumption per session still with open transaction

select distinct s.username,s.sid,s.serial#,rn.name,rs.curext,rs.curblk,t.used_ublk,t.used_urec,due.tablespace_name,

t.start_time

from v$transaction t

,v$session s

,v$rollname rn

,v$rollstat rs

,dba_undo_extents due

where t.addr = s.taddr

and t.xidusn = rn.usn

and rn.usn = rs.usn

and rn.name = due.segment_name

and due.status = 'ACTIVE' -- uncommitted

order by 7 desc ;

OraCsdP

Categories

vendredi 25 février 2011

UN DO Stress