mercredi 4 décembre 2024

RMAN / SBT_TAPE Tuning Performance for Partial Restore

"Executive Summary" :

Restore speed improves greatly when increasing the number of « Backup Pieces » :


1. Backup


Tuning Backup Size  : SECTION SIZE 4G

Increase Channels # : from 4 to 8

Tuning # of Datafile per Channel : maxopenfiles from 8 to 1


2. Restore


Increase Channels # : 12

Tuning number of Datafile per Channel : maxopenfiles 500


Details


A] Just a few RMAN Oracle terms to start :

 

Backup Set                  Logical structure where the backup is stored ; includes only whole files, never partial files.

Backup Piece              A Backup Set is composed by one or more physical binary pieces : « Backup Piece ».

Sections                      Contiguous range of blocks in a Datafile. One datafile is made by multiple sections.

SECTION SIZE           Useful to enable RMAN channels to back up a single large file in parallel. RMAN divides the work among multiple channels, with each channel backing up one file section in a backup piece.

MAXOPENFILES       This defines the number of files that can be read simultaneously by each channel. Default is 8. Value of 1, datafile is read one at a time, writing the backup data into the backup piece sequentially. Value of 8, the backup piece will contain data from every 8 datafiles mixed.


B] Now my experience with SBT_TAPE as CommVault :

 

0) RMAN using CommVault out of the box gave good Backup performance but very poor Restore speed for partial restore.

 

Channels       = 4

Maxopenfiles   = 8

 

For instance, allocating 4 Channels and Maxopenfiles = 8 provide me a 240 GB / Hour throughput - thais 4 hours per TB.

Then restoring the whole database with same Channels & MaxOpenfiles setting was ok whith a 210 GB / H rate.

The nightmare arrived when running TSPITR or DBPITR with exclude Tablespaces clause.

The throughput went down to 32 GB per hour ........ Restoring 64 GB took 2 hours.

 

Backup  1 TB database     rate = 240 GB / H

Restore 1 TB database     rate = 210 GB / H

Partial Restore 64 GB     rate =  32 GB / H !!

 

1) More Channels and less MaxOpenfiles

 

Channels     = 8

Maxopenfiles = 1

 

Backup got a bit slower          rate = 200 GB / H

Partial Restore was tiny better  rate =  50 GB / H

 

What I understood is Performance comes obviously with parallelism.

But allocating more Channel if they have to scan big Backup Piece to get the data is not enough.

We need small pieces to be read by restore process.

 

2a) Setting "Section Size" to get more Backup Pieces at Backup

 

Channels     = 8

Maxopenfiles = 1

SECTION SIZE = 2G

 

2b) Setting more Open Files at Restore to achieve efficient parallelism

 

Channels     = 8

Maxopenfiles = 500

 

Miracle has arrived :

Backup was a bit faster          rate = 260 GB / H

Partial Restore 64 GB "Rocket"   rate = 480 GB / H !!

 

So TSPITR Restore went from 2 hours to 8 minutes !

The important point to see is the number of Backup Pieces was 8 with my 1 TB database backuped with default setting.

It increased from 8 to 464 with the "Section Size" setting of  2 GB.

 

3) A good compromise between number of Backup Pieces and speed of Partial Restore is

 

Channels     = 8

Maxopenfiles = 1

SECTION SIZE = 4G

 

Backup rate = 260 GB / H --> 3H35

That is 232 Backup Pieces.

 

Channels     = 12

Maxopenfiles = 500

 

Total Restore           rate = 605 GB / H !! --> 1H30

Partial Restore 64 GB   rate = 295 GB / H --> 13 min.


C] Results


Tests

Action

Channels

MAXOPENFILES

MAXPIECESIZE

SECTION SIZE

Size GB

Duration

B Pieces #

Rate GB / H

1

Backup

4

8

Unlim.

Unlim.

950

04H00

 

238

1

Restore Total

4

8

Unlim.

Unlim.

950

04H30

 

211

1

Restore Partial

4

8

Unlim.

Unlim.

64

02H00

 

32

2

Backup

8

1

Unlim.

Unlim.

910

04H30

8

202

2

Restore Partial

8

1

Unlim.

Unlim.

64

01H15

5

51

3

Backup

8

1

16 G

Unlim.

908

04H26

16

202

3

Restore Partial

8

1

16 G

Unlim.

64

01H13

10

51

4

Backup

8

1

Unlim.

2 G

908

03H36

464

259

4

Restore Partial

8

1

Unlim.

2 G

64

8 Min.

36

480

5

Backup

8

1

Unlim.

8 G

908

03H36

103

259

5

Restore Partial

8 ou 12

1

Unlim.

8 G

64

26 Min.

10

148

6

Backup

8

1

Unlim.

4 G

908

03H35

232

259

6

Restore Partial

12

500

Unlim.

4 G

64

13 min.

21

295

6

Restore Total

12

500

Unlim.

4 G

908

1H30

232

605

D] RMAN scripts

Backup

cat rman_backup_db_online_full_cmv_cat_PRO.scr
#+-+-+-++-+-+-++-+-+-++-+-+-+-+-+-+-+
# RMAN backup script
# CG13 DSIT 22/03/2021 # AUC
# Standard Database 11G - 12C - 19C
# INCREMENTAL LEVEL = 0 database
# Media : CommVault
# Catalog
# FULL
#+-+-+-++-+-+-++-+-+-++-+-+-+-+-+-+-+
# 9i cmd 'backup database + archivelogs + controlfile'
# when channel type 'sbt_tape', default is "plus archivelog"
# cinematic :
# sql 'alter system archive log current' ;
# backup archivelog all
# backup database
# sql 'alter system archive log current' ;
# backup archivelog -> archivelogs generated during backup
# delete input
# autobackup of controlfile & spfile
 
connect target /
connect catalog IRC/<psw>@IRC_PRO
 
# expiration policy
configure retention policy to recovery window of 31 days ;
 
# expiration policy : old backups purge from CTL & RCAT
allocate channel for maintenance device type disk ;
allocate channel for maintenance device type 'sbt_tape' ;
 
crosscheck backup;
delete force noprompt obsolete recovery window of 31 days ;
 
# autobackup of controlfile & spfile after a successful backup
# in a well-known format that can be retrieved without rcat !!
# default = OFF
configure controlfile autobackup on ;
CONFIGURE CONTROLFILE AUTOBACKUP FORMAT FOR DEVICE TYPE 'sbt_tape' TO '%F';
CONFIGURE SNAPSHOT CONTROLFILE NAME TO '/work/oracle/&1/rman/snapcf_&1.f';
 
# avoid backup of archivelogs if ALREADY BACKUPED ONCE :
configure backup optimization on ;
 
# Media : 'sbt_tape'
CONFIGURE DEVICE TYPE 'sbt_tape' PARALLELISM 4 BACKUP TYPE TO COMPRESSED BACKUPSET ;
 
run {
 
# rcat/ctl & disk arclogs sync
# note : 'archive log current' is the only way of switching log in every RAC instances
# whereas 'switch logfile' is only local
resync catalog ;
Change Archivelog All Crosscheck ;
 
allocate channel ch1 type 'sbt_tape' maxopenfiles 1 ;
allocate channel ch2 type 'sbt_tape' maxopenfiles 1 ;
allocate channel ch3 type 'sbt_tape' maxopenfiles 1 ;
allocate channel ch4 type 'sbt_tape' maxopenfiles 1 ;
allocate channel ch5 type 'sbt_tape' maxopenfiles 1 ;
allocate channel ch6 type 'sbt_tape' maxopenfiles 1 ;
allocate channel ch7 type 'sbt_tape' maxopenfiles 1 ;
allocate channel ch8 type 'sbt_tape' maxopenfiles 1 ;
 
backup INCREMENTAL LEVEL = 0 database SECTION SIZE 4G format 'db_F_%d_s%s_U%U_%Y%M%D' include current controlfile tag = '&2'
plus archivelog format 'ar_F_%d_s%s_U%U_%Y%M%D' tag = '&2' ;
 
# RMAN Deletes Archivelog Files Not Yet Applied To Standby Database (Doc ID 1987102.1)
DELETE ARCHIVELOG ALL COMPLETED BEFORE 'sysdate-12/24' ;
 
# added 210915 Keep 1 Year Full Backup
change backup tag '&2' keep until time 'sysdate+365' ;
 
}

Tuning the Restore :

allocate channel ch1 type 'sbt_tape' PARMS="SBT_LIBRARY=/opt/commvault/Base64/libobk.a(shr.o), BLKSIZE=1048576 " TRACE 0;

...

allocate channel ch12 type 'sbt_tape' PARMS="SBT_LIBRARY=/opt/commvault/Base64/libobk.a(shr.o), BLKSIZE=1048576 " TRACE 0;

setlimit channel ch1 maxopenfiles 500;