|
dave spink toolset |
|
RESTORESFor restores first suspend the tape to prevent a backup from writing data onto the tape. Place the media into a pool that won't be ejected during a vault session. The suspend command allows the tape to form part of the scratch pool once the expiration date is reached, whereas a freeze would prevent that from happening. Once the restore is complete move the media into a pool that forms part of vault ejects. See Netbackup Restore Scripts for automating processes. Move media into a non vault pool. Suspend media. # MEDIA=800100 # aa=`bpmedialist -L -m $MEDIA | grep "Server Host" | cut -c15-` # bpmedia -suspend -m $MEDIA -h $aa # bpmedialist -h $aa -m $MEDIA | grep SUSPENDED Check status is 0x0. If the media status is not 0x0, cannot continue. # vmquery -m $MEDIA | grep '^status: 0x0' Get volume pool numbers. # aa=`vmquery -m $MEDIA | grep "volume pool" | cut -f 2 -d"(" | sed 's/)//'` # POOL=$aa Deassign pool to scratch. # vmquery -deassignbyid $MEDIA $POOL 0x0 Change pool to onsite (this is our non eject pool). # poolVar=`vmpool -listall -bx | grep onsite | awk '{print $2}'` # vmchange -p $poolVar -m $MEDIA Set assignment time. # aa=`bpmedialist -L -m $MEDIA | grep '^allocated' | cut -f 2 -d"(" | sed 's/)//'` # vmquery -assignbyid $MEDIA hcart2 $poolVar 0x0 $aa The media should now be visible in the onsite pool within NetBackup GUI. The media is suspended meaning no additional backup will ocurr. The onsite pool is not ejected via vault. Move media from non eject pool to standard eject pool. Check media is still suspended before moving out of onsite pool. If the media status is not suspended you cannot continue. # MEDIA=800100 # aa=`bpmedialist -L -m $MEDIA | grep "Server Host" | cut -c15-` # bpmedialist -h $aa -m $MEDIA | grep SUSPENDED Check status is 0x0. If the media status is not 0x0, cannot continue. # vmquery -m $MEDIA | grep '^status: 0x0' Get volume pool numbers. # aa=`vmquery -m $MEDIA | grep "volume pool" | cut -f 2 -d"(" | sed 's/)//'` # POOL=$aa Deassign pool to scratch. # vmquery -deassignbyid $MEDIA $POOL 0x0 Change pool to one that forms part of vault ejects. # poolVar=`vmpool -listall -bx | grep "Windows_OS_Full" | awk '{print $2}'` # vmchange -p $poolVar -m $MEDIA Set assignment time. # aa=`bpmedialist -L -m $MEDIA | grep '^allocated' | cut -f 2 -d"(" | sed 's/)//'` # vmquery -assignbyid $MEDIA hcart2 $poolVar 0x0 $aa The media should now be visible in the Windows OS Full pool within NetBackup GUI. The media is still suspended meaning no additional backup will ocurr. The Windows OS Full pool forms part of the Vault process. EXPIRING MEDIAWhen the "bpexpdate -m mediaID -d 0" fails it's usually because one of the DB (volume, image and media db) has a problem. Query the volume database on the master server. # MEDIA=800100 # vmquery -m $MEDIA Query the images database on the master server. # bpimmedia -L -mediaid $MEDIA Query ALL media servers for the media database entry and reports on the host that 'owns' the media. # bpmedialist -mlist -ev $MEDIA Variants on the bpexpdate command to clear. From the media database. # bpexpdate -m $MEDIA -d 0 -host media_server -justmedia -force From the image database. # bpexpdate -ev $MEDIA -d 0 -justimage -force From the volume database. # vmdelete -m $MEDIA The volume database may complain about the media still being assigned to a pool. The actual command to deassign the media uses numerics of pool and status. When complete the media is placed in scratch pool. # vmquery -m $MEDIA | egrep '(volume pool|status)' volume pool: Offsite (4) status: 0x0 # vmquery -deassignbyid media id 4 0 REPORT SAMPLEQuery database to show media details of a pool. # vmquery -b -p 8 -h uxnbpr10 | more media media robot robot robot side/ optical # mounts/ last ID type type # slot face partner cleanings mount time ------------------------------------------------------------------------------- 000430 HCART2 NONE - - - - 15 07/10/2008 08:18 000470 HCART2 NONE - - - - 28 11/16/2008 00:03 000525 HCART2 ACS 1 - - - 27 12/18/2008 19:03 000670 HCART2 NONE - - - - 11 04/26/2008 18:51 000689 HCART2 ACS 1 - - - 19 12/18/2008 19:03 000693 HCART2 NONE - - - - 8 03/02/2008 01:23 000718 HCART2 NONE - - - - 17 11/16/2008 00:02 See how much data was backed up. # bpimagelist -d 12/17/08 18:00:00 -e 12/17/08 22:00:00 -U | awk '{ total += $5 } END { print total }' 34028534929 See cumulative incremental media used. # bpimagelist -st CINC -media -d 12/17/08 18:00:00 -e 12/17/08 22:00:00 -U Media ID Last Written Server -------- ---------------- ---------- 720858 12/17/2008 19:07 uxnbpr33 000011 12/17/2008 20:14 uxnbpr15.nam.pwcinternal.com 701559 12/17/2008 19:09 uxnbpr33 See media db, expiration, retention level, density, vmpool. # bpmedialist -L -m 720858 Server Host = uxnbpr33 media_id = 720858, partner_id = *NULL*, version = 1 density = hcart2 (14) allocated = 12/17/2008 18:00:52 (1229554852) last_written = 12/17/2008 19:07:27 (1229558847) expiration = 12/31/2008 19:07:27 (1230768447) last_read = N.A (0) kbytes = 99739461, nimages = 52, vimages = 52 (MPX) retention_level = 1, num_restores = 0 status = 0x202, l_offset = 389786, psize = 0, hsize = 1024, ssize = 0 vmpool = 8 res1 = 0, res2 = 0, res3 = 0, res4 = 0 See status of backups. # bperror -backstat -d 12/17/08 18:00:00 -e 12/17/08 22:00:00 -U | more STATUS CLIENT POLICY SCHED SERVER TIME COMPLETED 0 uxhrpr57n fs_ux_os_hr Daily_2w uxnbpr32 12/17/2008 18:02:13 0 UXSTPR02n fs_ux_os_1800 Daily_2w uxnbpr33 12/17/2008 18:03:02 0 uxdtpr54n fs_ux_os_1800 Daily_2w uxnbpr33 12/17/2008 18:03:23 0 ustpa3gtsbk20 fs_win_os_1900_s Daily_5w uxnbpr15.na 12/17/2008 18:03:37 0 ustpa3abswh58 fs_win_os_intel2 Daily_5w uxnbpr29.na 12/17/2008 18:03:41 0 us-tpadns012n fs_ux_os_1800 Daily_2w uxnbpr30 12/17/2008 18:04:09 See amount of data backed up # bpimagelist -d 01/02/09 18:00:00 -e 01/02/09 22:00:00 -U | more Backed Up Expires Files KB C Sched Type Policy ---------------- ---------- -------- -------- - ------------ ------------ 01/02/2009 18:00 01/16/2009 21912 307669 N Cumulative I fs_ux_os_1800 01/02/2009 18:01 01/16/2009 34976 4940127 N Cumulative I fs_ux_os_1800 01/02/2009 18:01 01/16/2009 9209 1164775 N Cumulative I fs_ux_os_1800 01/02/2009 19:00 01/16/2009 6207 22779 N Cumulative I fs_ux_os_1900 01/02/2009 20:00 03/05/2009 129349 19112369 N Full Backup fs_ux_os_inet 01/02/2009 20:00 03/05/2009 139422 20846925 N Full Backup fs_ux_os_inet 01/02/2009 19:00 01/16/2009 7888 2217331 N Cumulative I fs_ux_os_1900 DEVICE CONFIGNetBackup uses it's own driver for communicating with SCSI controlled robotic devices, called SCSA (Generic SCSI passthru driver) or "sg" driver. The problem we experienced was that our device tree showed the correct devices, however, sgscan produced incorrect results. This document illustrates how the solve this problem. Our sgscan listing is missing 7 tape devices. We need to configure sg.conf to resolve. # sgscan /dev/sg/c0t0l0: Disk (/dev/rdsk/c0t0d0): "SEAGATE ST336704LSUN36G" /dev/sg/c0t1l0: Disk (/dev/rdsk/c0t1d0): "SEAGATE ST336704LSUN36G" /dev/sg/c2t0l0: Disk (/dev/rdsk/c2t0d0): "EMC SYMMETRIX" /dev/sg/c2t0l1: Changer: "ADIC Scalar 1000" /dev/sg/c2t1l0: Array-controller: "ADIC Pathlight 5000" /dev/sg/c2t1l1: Tape (/dev/rmt/4): "IBM ULT3580-TD1" /dev/sg/c3t0l0: Array-controller: "ADIC Pathlight 5000" Determine your controller numbers, targets and LUNs. Use a mixture of tools - format, ezfibre, ls /dev/rmt tree. We had controller c3, target0 to pathlight switch A, ADIC robot controller, and 4 LTO drives. Our target1 mapped to pathlight switch B and 4 LTO drives. Then verify sg.build exists in your path statement. # which sg.build /opt/openv/volmgr/bin/sg.build Change into the /usr/openv/volmgr/bin/driver directory and run the sg.build command. The -mt tag determines the max number of controllers targets to search. The -ml is for the maximum numbers of LUNs. These values vary based on the physical layout of your hardware. You must run sg.build from the driver directory as this is where sg.install reads files from. # cd /usr/openv/volmgr/bin/driver # sg.build all -mt 2 -ml 6 Created file ./st.conf. Created file ./sg.conf. Created file ./sg.links. Modify the /kernel/drv/st.conf with newly created st.conf from sg.build script. # cp /kernel/drv/st.conf /kernel/drv/st.conf-orig # cd /usr/openv/volmgr/bin/driver # ls ./st.conf # vi /kernel/drv/st.conf Place a # in column of each line of the seven default entries. Insert the newly created ./st.conf into /kernel/drv/st.conf. Detemine if the sg driver is loaded into memory and remove if loaded. # modinfo | grep sg 94 7820afbd 16ea 97 1 sysmsg (System message redirection (fan) 275 781b0ef3 302d 278 1 sg (SCSA Generic Revision: 3.4d) 288 7828a202 1eb8 49 1 msgsys (System V message facility) 288 7828a202 1eb8 49 1 msgsys (32-bit System V message facilit) # /usr/sbin/rem_drv sg # modinfo | grep sg 94 7820afbd 16ea 97 1 sysmsg (System message redirection (fan) 288 7828a202 1eb8 49 1 msgsys (System V message facility) 288 7828a202 1eb8 49 1 msgsys (32-bit System V message facilit) Run sg.install script to rebuild sg device tree and sg.conf. # cp /kernel/drv/sg.conf /kernel/drv/sg.conf-OLD # rm /kernel/drv/sg.conf # cd /usr/openv/volmgr/bin/driver # ./sg.install Copied files to /kernel/drv and to /kernel/drv/sparcv9. Doing add_drv of the sg driver Removing old /dev/sg entries Editing /etc/devlink.tab... Copying original /etc/devlink.tab to /etc/devlink.tab.xxx Added entry in /etc/devlink.tab file Made links in /dev/sg Reboot with reconfiguration # reboot -- -r Running sgscan now produces the correct results. # sgscan /dev/sg/c0t0l0: Disk (/dev/rdsk/c0t0d0): "SEAGATE ST336704LSUN36G" /dev/sg/c0t1l0: Disk (/dev/rdsk/c0t1d0): "SEAGATE ST336704LSUN36G" /dev/sg/c2t0l0: Disk (/dev/rdsk/c2t0d0): "EMC SYMMETRIX" /dev/sg/c2t0l1: Changer: "ADIC Scalar 1000" /dev/sg/c2t0l2: Tape (/dev/rmt/0): "IBM ULTRIUM-TD1" /dev/sg/c2t0l3: Tape (/dev/rmt/1): "IBM ULTRIUM-TD1" /dev/sg/c2t0l4: Tape (/dev/rmt/2): "IBM ULTRIUM-TD1" /dev/sg/c2t0l5: Tape (/dev/rmt/3): "IBM ULTRIUM-TD1" /dev/sg/c2t1l0: Array-controller: "ADIC Pathlight 5000" /dev/sg/c2t1l1: Tape (/dev/rmt/4): "IBM ULT3580-TD1" /dev/sg/c2t1l2: Tape (/dev/rmt/5): "IBM ULTRIUM-TD1" /dev/sg/c2t1l3: Tape (/dev/rmt/6): "IBM ULTRIUM-TD1" /dev/sg/c2t1l4: Tape (/dev/rmt/7): "IBM ULTRIUM-TD1" /dev/sg/c3t0l0: Array-controller: "ADIC Pathlight 5000" Since the sgscan is working correctly you can now load NetBackup and run the wizard for configuring tape devices. ACTIVITY MONITORUnable to access Activity Monitor through Windows GUI. # netbackup stop # bbps -a # bp.kill_all # bpps -a If all process are stopped check for hung queue(s). Under message queue look for the letter q with a number id OTHER than 1 (for example: q 258). Delete the hung queue. # ipcs -a # ipcsrm -q xxx # ipcs -a If hung queue(s) are gone start Netbackup. # netbackup start ACSLSLogon onto control host. # ssh sanadm@uxnbacsls2 # su - acsss # cmd_proc Query ACSLS. ACSSA> q cap all vary cap 0,1,0 offline vary cap 0,1,0 online ACSSA> q vol 713588 Identifier Status Current Location Type 713588 home 0, 5,12, 1, 0 STK2P ACSSA> q lmu all ACS: 0 Mode: Single LMU Master Status: Communicating Not Partitioned Standby Status: - Port Port State Role CL Port Name 0, 0 online - 20 10.26.156.246 0, 1 online - 20 10.26.37.92 Mount / Unmount a tape. ACSSA> mount 713882 0,5,1,12 Mount: 713882 mounted on 0, 5, 1,12 ACSSA> dis 713882 0,5,1,12 f Dismount: Forced dismount of 713882 from 0, 5, 1,12 Check Logs. # uxacpr04:/acslsha/home/ACSSS/log> grep -i 713462 acsss_event.log mt_upda_dm: Cartridge 713462, new location 0, 2, 4, 2, 0 cm_env_move[29887.667]: Volume 713462 re-entered into library to cell 0, Volume 713462 reactivated. Robot STK. # ssh sanadm@uxnbacsls2 # /SLConsolestk/RunLatestSLConsole User - service Password - sun8500 Library IP - 10.26.156.246 VAULTCheck if deferred eject was run. # cd /usr/openv/netbackup/vault/sessions/PWC_SL8500 If a vault job was cancelled, the media is placed in offsite pool (I'm not sure why). Run another inventory before attempting to start another vault job. This may take a while. # /usr/openv/volmgr/bin/vmupdate -rn 1 -rt acs -acs_stk2p hcart2 -use_barcode_rules -h uxnbmaster2.nam.pwcinternal.com If vault eject not run then load gui, select vault management, click schedule, right click start to start a deferred eject. Alternatively from CLI. # vltrun Daily_SL8500 # ps -ef | grep -i vlt # cd /usr/openv/netbackup/vault/sessions/PWC_SL8500 When vault complete, via GUI right click deferred eject to see list of options. PERFORMANCESet TCP/IP Buffer Size on media servers and all clients. # echo "65536" > /usr/openv/netbackup/NET_BUFFER_SZ Set the Data Buffer Size & Number of Data Buffers. The Media server uses shared memory between network and tape drive. The Data buffer size must not exceed tape I/O (LTO handle 256Kb), represent as 262144 (256 * 1024). Therefore "tape block size" and "shared data buffer size" are synonymous. # echo "262144" > /usr/openv/netbackup/db/config/SIZE_DATA_BUFFERS # echo "64" > /usr/openv/netbackup/db/config/NUMBER_DATA_BUFFERS Calculate amount of "shared memory" used by Netbackup as (number_data_buffers * size_data_buffers) * number of tape drives * max multiplexing setting). (262144 * 64 ) * 8 * 4 = 512Mb (262144 * 16 ) * 8 * 4 = 128Mb The bptm Log contains details on bptm child process that reads data from client into shared memory buffers. The bptm parent process writes data from shared memory buffers to tape drive. See messages like "waited for full buffer.." plenty of buffer space available, "waited for empty buffer.." running out of buffers, "delayed n time.." waiting for buffer to become available. CLIENTSInstall Via GUI requires trusted unix clients, or install via software distribution to client. master# /usr/openv/netbackup/bin/install_client_files ftp d1de0199 spi0004 client# sh /tmp/bp/bin/client_config Upgrade Clients Verify there are no active backups or restores. # /usr/openv/netbackup/bin/bpps | grep -v bpjava-susvc Stop master server listener. # /usr/openv/netbackup/bin/admincmd/bprdreq -terminate Create a client list file and update. # vi t.txt "Solaris Solaris8 tpadev5.cpships.com" # /usr/openv/netbackup/bin/update_clients -ClientList t.txt Start NetBackup and Media Server daemons. # /usr/openv/volmgr/bin/ltid # /usr/openv/netbackup/bin/initbprd To see all active policies and clients. # change into policy directory and create a list of policies cd /usr/openv/netbackup/db/class ls > t.t # get list of active policies for i in `cat t.t` do bpplinfo $i -L | grep "Active" | grep yes > /dev/null if [ $? -eq 0 ]; then echo $i >> /tmp/activepolicy.txt fi done # get list of in-active policies for i in `cat t.t` do bpplinfo $i -L | grep "Active" | grep yes > /dev/null if [ $? -eq 1 ]; then echo $i >> /tmp/activeNOpolicy.txt fi done # for active policies get the clients for i in `cat /tmp/activepolicy.txt` do echo $i >> /tmp/activeclients.txt bpplclients $i -L | awk '{print $2}' >> /tmp/activeclients.txt echo "" >> /tmp/activeclients.txt done # sort and remove duplicates sort -u uxnbmaster4-activeclients.txt > t.t mv t.t activeclients-uniq.txt NON-ROOT USERAuthorisation File ( /usr/openv/java/auth.conf ) root ADMIN=ALL JBP=ALL helpdesk ADMIN=DM+AM pri0001 ADMIN=ALL JBP=ALL edg0001 ADMIN=ALL JBP=ALL cra0005 ADMIN=ALL JBP=ALL car0045 ADMIN=ALL JBP=ALL spi0004 ADMIN=ALL JBP=ALL ric0017 ADMIN=ALL JBP=ALL don0011 ADMIN=JBP+AM+REP JBP=ENDUSER+BU * ADMIN=JBP JBP=ENDUSER+BU Authorisation Details. ALL - admin for all AM - activity monitor BPM - backup policy mgnt BAR - backup, archive, restore JBP - backup, archive, restore CAT - catalog DM - device monitor HPD - host properties MM - media management REP - reports SUM - storage unit management * - any user provided they're in /etc/passwd Authorisation JBP Values. ENDUSER - allows restores BU - allows backup ARC - allows archive backup (also needs BU) RAWPART - allows raw partition restore ALL - all above, plus restoring to different client FIREWALL SUPPORTClient Attributes (Master Server Host Properties - client attributes tab). Select "No Connect-Back" meaning don't used random port, use vnetd port i.e. server connect to bpcd (13782) and responds from vnetd (13724). Then allow vnetd port through firewall. Enable vnetd using CLI # SERVER=ustpa3ifsws303.nam.pwcinternal.com # bpclient -add -client $SERVER -no_callback 1 -current_host $SERVER -WOFB_enabled 0 MULTIPLE NICsClient with Multiple Interfaces. client hostA 10.140.129.1 and hostA-1 10.140.129.2 server receives client ip 10.140.129.1 server performs gethostbyaddr() and checks client exists server receives client ip 10.140.129.2 server performs gethostbyaddr() and checks client does not exist solution: REQUIRED_INTERFACE = hostA Server with Multiple Interfaces. server hostA 10.140.129.1 and hostA-1 10.140.129.2 client receives server ip 10.140.129.1 client performs gethostbyaddr() matches SERVER in bp.conf client receives server ip 10.140.129.2 client performs gethostbyaddr() does not match SERVER in bp.conf solution: REQUIRED_INTERFACE = hostA Master Server with Multiple Networks. master server eth0 jupiter 10.140.128.x master server eth1 meteor 10.150.137.x clients on 10.150.137.x have SERVER=meteor in bp.conf clients on 10.140.128.x have SERVER=jupiter in bp.conf MOVE CATALOGMoving Catalogs if Disk Space Full.
MEDIA SERVERForce Alternate Media Server Restore. We changed drive types 9940B to LTO4. Several months later the client needed to restore files and netbackup sent the restore to the original media server. The problem was we didn't have any 9940B drives on that media server. Hence, the quick fix was to force the restore over to another media server that contained the 9940B drives. # vi /usr/openv/netbackup/bp.conf FORCE_RESTORE_MEDIA_SERVER = uxnbpr30 uxnbpr32 JOBSError 5 cannot read image from media id 713882, drive index 2, I/O error. Check the media server drive status. See extract from restore log "granted resource 0_2_1_13_9940B" and "media server: uxnbpr26.nam.pwcinternal.com." # vmquery -m 713882 # vmoprcmd -h uxnbpr26 -d Drv Type Control User Label RecMID ExtMID Ready Wr.Enbl. ReqId 0 hcart2 ACS - No - 0 1 hcart2 ACS - No - 0 2 hcart2 ACS - No - 0 3 hcart2 ACS - No - 0 Drv DriveName Shared Assigned Comment 0 0_3_1_12_9940B Yes - 1 0_0_1_13_9940B Yes - 2 0_2_1_13_9940B No - 3 0_5_1_12_9940B No - Log into ACSLS and try mounting tape into another drive. ACSSA> mount 713882 0,5,1,12 Mount: 713882 mounted on 0, 5, 1,12 Confirm via OS that tape was loaded. # tpconfig -d Id DriveName Type Residence Drive Path Status **************************************************************************** 0 0_3_1_12_9940B hcart2 ACS(0) ACS=0, LSM=3, PANEL=1, DRIVE=12 /dev/rmt/0cbn UP 1 0_0_1_13_9940B hcart2 ACS(0) ACS=0, LSM=0, PANEL=1, DRIVE=13 /dev/rmt/2cbn UP 2 0_2_1_13_9940B hcart2 ACS(0) ACS=0, LSM=2, PANEL=1, DRIVE=13 /dev/rmt/1cbn UP 3 0_5_1_12_9940B hcart2 ACS(0) ACS=0, LSM=5, PANEL=1, DRIVE=12 /dev/rmt/3cbn UP # mt -f /dev/rmt/3cbn status StorageTek 9940B tape drive: sense key(0x0)= No Additional Sense residual= 0 retries= 0 file no= 0 block no= 0 Umount tape via ACSLS and check OS no longer sees it mounted. ACSSA> dis 713882 0,5,1,12 f Dismount: Forced dismount of 713882 from 0, 5, 1,12 # mt -f /dev/rmt/3cbn status /dev/rmt/3cbn: no tape loaded or drive offline Down the suspected problem drive. See jobs details for drive index used. # vmoprcmd -down 2 # vmoprcmd -h uxnbpr26 -d Resume restore job in NetBackup. the requested operation was successfully completd (0) bpverify If the restore fails even after trying the procedure above you can confirm the image state. The restore job log contains the image number needed for bpverify, for example "restoring from image ustpa3tlsno27.nam.pwcinternal.com_1267749978". Note, a bpverify can take a very long time to complete. # bpverify -v -backupid ustpa3tlsno27.nam.pwcinternal.com_1267749978 Verify started Sat Mar 06 13:14:14 2010 INF - Verifying policy fs_wintel_1900, schedule Daily_1m (ustpa3tlsno27.nam.pwcinternal.com_1267749978) media id 707001, created 03/04/2010 19:46:18. |