Issue:
Today through manual monitoring we found that the /GG volume was 100% full on a omnlink server
[oracle@AZPRODORA01 ~ JPROD1]$df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/VolGroup01-LogVol00
67G 46G 18G 73% /
/dev/cciss/c0d0p1 99M 53M 41M 57% /boot
tmpfs 48G 236M 48G 1% /dev/shm
/dev/mapper/Oracle_GG
139G 132G 9.9M 100% /GG
Investigation:
What is eating up the space and why
[oracle@AZPRODORA01 GG JPROD1]$du -h –max-depth=2
16K ./CPROD/dirpcs
4.0K ./CPROD/dirtmp
20M ./CPROD/dirjar
8.0K ./CPROD/dirwlt
901M ./CPROD/dirdat
136K ./CPROD/cfg
4.0K ./CPROD/dirdef
61M ./CPROD/dirbdb
48K ./CPROD/dirprm
452K ./CPROD/UserExitExamples
32K ./CPROD/dirsql
4.0K ./CPROD/dirout
228K ./CPROD/dirchk
7.7M ./CPROD/dirrpt
2.1G ./CPROD
16K ./JPROD/dirpcs
4.0K ./JPROD/dirtmp
20M ./JPROD/dirjar
8.0K ./JPROD/dirwlt
123G ./JPROD/dirdat
164K ./JPROD/cfg
4.0K ./JPROD/dirdef
112M ./JPROD/dirbdb
52K ./JPROD/dirprm
452K ./JPROD/UserExitExamples
32K ./JPROD/dirsql
4.0K ./JPROD/dirout
80K ./JPROD/dirchk
6.3M ./JPROD/dirrpt
126G ./JPROD
20K ./VPROD/dirpcs
4.0K ./VPROD/dirtmp
20M ./VPROD/dirjar
8.0K ./VPROD/dirwlt
1.3G ./VPROD/dirdat
160K ./VPROD/cfg
4.0K ./VPROD/dirdef
86M ./VPROD/dirbdb
52K ./VPROD/dirprm
452K ./VPROD/UserExitExamples
32K ./VPROD/dirsql
4.0K ./VPROD/dirout
12K ./VPROD/BR
244K ./VPROD/dirchk
8.8M ./VPROD/dirrpt
3.6G ./VPROD
132G .
[oracle@AZPRODORA01 dirdat JPROD1]$ls -lrth
total 115G
-rw-r—– 1 oracle oinstall 477M May 30 11:29 1r034254
-rw-r—– 1 oracle oinstall 477M May 30 11:50 1r034255
-rw-r—– 1 oracle oinstall 477M May 30 12:17 1r034256
-rw-r—– 1 oracle oinstall 477M May 30 12:46 1r034257
-rw-r—– 1 oracle oinstall 477M May 30 13:13 1r034258
-rw-r—– 1 oracle oinstall 477M May 30 13:41 1r034259
-rw-r—– 1 oracle oinstall 477M May 30 14:09 1r034260
-rw-r—– 1 oracle oinstall 477M May 30 14:36 1r034261
-rw-r—– 1 oracle oinstall 477M May 30 15:04 1r034262
-rw-r—– 1 oracle oinstall 477M May 30 15:33 1r034263
-rw-r—– 1 oracle oinstall 477M May 30 16:03 1r034264
dirdat is 123G in size and it is the folder in which extract files are stored. On normal functioning, a replicat picks these files (equivalent to redolog files in oracle) and applies it to the target database. Once applied, purges them. So probably a replicat is not running or stuck and hence not applying the extract files. So checked status by connecting to ggsci
[oracle@AZPRODORA01 JPROD JPROD1]$ggsci
Oracle GoldenGate Command Interpreter for Oracle
Version 11.2.1.0.7 16934304 OGGCORE_11.2.1.0.7_PLATFORMS_130709.1600.1_FBO
Linux, x64, 64bit (optimized), Oracle 11g on Jul 18 2013 07:04:28
Copyright (C) 1995, 2013, Oracle and/or its affiliates. All rights reserved.
GGSCI (AZPRODORA01) 1> info all
Program Status Group Lag at Chkpt Time Since Chkpt
MANAGER STOPPED
JAGENT STOPPED
EXTRACT STOPPED EJPROD 00:00:00 25236:51:45
EXTRACT STOPPED PJPRODHS 00:00:00 25236:51:41
EXTRACT STOPPED PJPRODP 00:00:00 10739:09:31
REPLICAT STOPPED RJPRODHS 00:00:00 490:41:32
REPLICAT RUNNING RJPRODP 132:23:52 00:00:00
Resolution:
This showed that the replicat RJPRODP is lagging by 132 hours which is going back to 6 days which corresponds to the oldest file in the dirdat directory. so stopped replicat RJPRODP and started the manager first. Then restarted the replicat.
GGSCI (AZPRODORA01) 4> stop replicat RJPRODP
Sending STOP request to REPLICAT RJPRODP …
STOP request pending end-of-transaction (865 records so far)..
GGSCI (AZPRODORA01) 7> start manager
Manager started.
GGSCI (AZPRODORA01) 8>
GGSCI (AZPRODORA01) 8> info all
Program Status Group Lag at Chkpt Time Since Chkpt
MANAGER RUNNING
JAGENT STOPPED
EXTRACT STOPPED EJPROD 00:00:00 25237:09:00
EXTRACT STOPPED PJPRODHS 00:00:00 25237:08:56
EXTRACT STOPPED PJPRODP 00:00:00 10739:26:46
REPLICAT STOPPED RJPRODHS 00:00:00 490:58:46
REPLICAT STOPPED RJPRODP 132:25:00 00:01:25
Sending START request to MANAGER …
REPLICAT RJPRODP starting
GGSCI (AZPRODORA01) 10> info all
Program Status Group Lag at Chkpt Time Since Chkpt
MANAGER RUNNING
JAGENT STOPPED
EXTRACT STOPPED EJPROD 00:00:00 25237:09:19
EXTRACT STOPPED PJPRODHS 00:00:00 25237:09:15
EXTRACT STOPPED PJPRODP 00:00:00 10739:27:05
REPLICAT STOPPED RJPRODHS 00:00:00 490:59:06
REPLICAT RUNNING RJPRODP 132:26:40 00:00:00
Monitoring:
Now monitor the ggserr.log (/GG/JPROD/ggserr.log) and if you see entries like this then it means that the replicat is working through the extract files.
2015-06-05 00:11:05 INFO OGG-00957 Oracle GoldenGate Manager for Oracle, mgr.prm: Purged old extract file /GG/JPROD/dirdat/1r034255, applying UseCheckPoints purge rule: Oldest Chkpt Seqno 34256 > 34255.
Also we can monitor the progress in ggsci as shown below. The output show the current extract file that the replicat is working on along wit the RBA that it currently on. As long as the RBA is ticking (like SCN for oracle) the replicat is working and processing records.
GGSCI (AZPRODORA01) 2> info replicat RJPRODP detail
REPLICAT RJPRODP Last Started 2015-06-04 23:43 Status RUNNING
Checkpoint Lag 132:06:54 (updated 00:00:00 ago)
Log Read Checkpoint File ./dirdat/1r034256
2015-05-30 12:13:13.257148 RBA 415340900
Extract Source Begin End
./dirdat/1r034256 2015-05-30 11:16 2015-05-30 12:13
./dirdat/1r034253 2015-05-30 01:09 2015-05-30 11:16
./dirdat/1r034233 2015-05-15 11:43 2015-05-30 01:09
./dirdat/1r033552 2015-05-15 11:43 2015-05-15 11:43
./dirdat/1r033552 2015-05-15 11:43 First Record
./dirdat/1r033552 2015-05-15 11:43 First Record
./dirdat/1r033552 2015-05-15 11:43 First Record
./dirdat/1r033552 2015-05-15 11:43 First Record
./dirdat/1r033552 2015-05-15 11:43 First Record
./dirdat/1r033552 2015-05-15 11:43 First Record
./dirdat/1r033552 2015-05-15 11:43 First Record
./dirdat/1r033552 2015-05-15 11:43 First Record
./dirdat/1r033552 2015-05-15 11:43 First Record
./dirdat/1r033552 2015-05-15 11:43 First Record
./dirdat/1r033552 2015-05-15 11:43 First Record
./dirdat/1r033552 2015-05-12 11:44 2015-05-15 11:43
./dirdat/1r033419 2015-05-06 04:52 2015-05-12 11:44
./dirdat/1r033137 2015-04-14 09:19 2015-05-06 04:52
./dirdat/1r032227 * Initialized * 2015-04-14 09:19
./dirdat/1r031329 * Initialized * First Record
./dirdat/1r031329 * Initialized * First Record
./dirdat/1r000000 * Initialized * First Record
Current directory /GG/JPROD
Report file /GG/JPROD/dirrpt/RJPRODP.rpt
Parameter file /GG/JPROD/dirprm/rjprodp.prm
Checkpoint file /GG/JPROD/dirchk/RJPRODP.cpr
Checkpoint table SYS.GGS_CHECKPOINT
Process file /GG/JPROD/dirpcs/RJPRODP.pcr
Stdout file /GG/JPROD/dirout/RJPRODP.out
Error log /GG/JPROD/ggserr.log
GGSCI (AZPRODORA01) 3> info replicat RJPRODP detail
REPLICAT RJPRODP Last Started 2015-06-04 23:43 Status RUNNING
Checkpoint Lag 132:05:00 (updated 00:00:00 ago)
Log Read Checkpoint File ./dirdat/1r034256
2015-05-30 12:16:21.107452 RBA 470300944
Extract Source Begin End
./dirdat/1r034256 2015-05-30 11:16 2015-05-30 12:16
./dirdat/1r034253 2015-05-30 01:09 2015-05-30 11:16
./dirdat/1r034233 2015-05-15 11:43 2015-05-30 01:09
./dirdat/1r033552 2015-05-15 11:43 2015-05-15 11:43
./dirdat/1r033552 2015-05-15 11:43 First Record
./dirdat/1r033552 2015-05-15 11:43 First Record
./dirdat/1r033552 2015-05-15 11:43 First Record
./dirdat/1r033552 2015-05-15 11:43 First Record
./dirdat/1r033552 2015-05-15 11:43 First Record
./dirdat/1r033552 2015-05-15 11:43 First Record
./dirdat/1r033552 2015-05-15 11:43 First Record
./dirdat/1r033552 2015-05-15 11:43 First Record
./dirdat/1r033552 2015-05-15 11:43 First Record
./dirdat/1r033552 2015-05-15 11:43 First Record
./dirdat/1r033552 2015-05-15 11:43 First Record
./dirdat/1r033552 2015-05-12 11:44 2015-05-15 11:43
./dirdat/1r033419 2015-05-06 04:52 2015-05-12 11:44
./dirdat/1r033137 2015-04-14 09:19 2015-05-06 04:52
./dirdat/1r032227 * Initialized * 2015-04-14 09:19
./dirdat/1r031329 * Initialized * First Record
./dirdat/1r031329 * Initialized * First Record
./dirdat/1r000000 * Initialized * First Record
Current directory /GG/JPROD
Report file /GG/JPROD/dirrpt/RJPRODP.rpt
Parameter file /GG/JPROD/dirprm/rjprodp.prm
Checkpoint file /GG/JPROD/dirchk/RJPRODP.cpr
Checkpoint table SYS.GGS_CHECKPOINT
Process file /GG/JPROD/dirpcs/RJPRODP.pcr
Stdout file /GG/JPROD/dirout/RJPRODP.out
Error log /GG/JPROD/ggserr.log