Our team deleted some archivelog by mistake. Rolled the database forwards by RMAN incremental recovery to an SCN. Did a manual recovery to sync it with the primary. Managed recovery is now failing. alter database recover managed standby database disconnect Alert log has :
1
2
3
4
5
6
7
8
9
10
11
12
Fri Jan 22 13:50:22 2010
Attempt 
to 
start background Managed Standby Recovery process
MRP0 started 
with 
pid=12
MRP0: Background Managed Standby Recovery process started
Media Recovery Waiting 
for 
thread 1 seq# 193389
Fetching gap 
sequence 
for 
thread 1, gap 
sequence 
193389-193391
Trying FAL server: ITS
Fri Jan 22 13:50:28 2010
Completed: 
alter 
database 
recover managed standby 
database 
d
Fri Jan 22 13:53:25 2010
Failed 
to 
request gap 
sequence
. Thread #: 1, gap 
sequence
: 193389-193391
All 
FAL server has been attempted.
Managed recovery was working earlier today after the Rman incremental and resolved two gaps automatically. But it now appears hung with the standby falling behind the primary.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
SQL> show parameter fal
 
NAME 
TYPE VALUE
------------------------------------ ----------- ------------------------------
fal_client string ITS_STBY
fal_server string ITS
 
[v08k608:ITS:oracle]$ tnsping ITS_STBY
 
TNS Ping Utility 
for 
Solaris: Version 9.2.0.7.0 - Production 
on 
22-JAN-2010 15:01:17
 
Copyright (c) 1997 Oracle Corporation. 
All 
rights reserved.
 
Used parameter files:
/oracle/product/9.2.0/network/admin/sqlnet.ora
 
 
Used TNSNAMES adapter 
to 
resolve the alias
Attempting 
to 
contact (DESCRIPTION = (ADDRESS = (PROTOCOL= TCP)(Host= v08k608.am.mot.com)(Port= 1526)) (CONNECT_DATA = (SID = ITS)))
OK (10 msec)
[v08k608:ITS:oracle]$ tnsping ITS
 
TNS Ping Utility 
for 
Solaris: Version 9.2.0.7.0 - Production 
on 
22-JAN-2010 15:01:27
 
Copyright (c) 1997 Oracle Corporation. 
All 
rights reserved.
 
Used parameter files:
/oracle/product/9.2.0/network/admin/sqlnet.ora
 
 
Used TNSNAMES adapter 
to 
resolve the alias
Attempting 
to 
contact (DESCRIPTION = (ADDRESS = (PROTOCOL= TCP)(Host= 187.10.68.75)(Port= 1526)) (CONNECT_DATA = (SID = ITS)))
OK (320 msec)
 
Primary 
has :
SQL> show parameter log_archive_dest_2
log_archive_dest_2 string SERVICE=DRITS_V08K608 reopen=6
0 max_failure=10 net_timeout=1
80 LGWR ASYNC=20480 OPTIONAL
NAME 
TYPE VALUE
------------------------------------ ----------- ------------------------------
log_archive_dest_state_2 string ENABLE
[ITS]/its15/oradata/ITS/arch> tnsping DRITS_V08K608
TNS Ping Utility 
for 
Solaris: Version 9.2.0.7.0 - Production 
on 
22-JAN-2010 15:03:24
Copyright (c) 1997 Oracle Corporation. 
All 
rights reserved.
Used parameter files:
/oracle/product/9.2.0/network/admin/sqlnet.ora
Used TNSNAMES adapter 
to 
resolve the alias
Attempting 
to 
contact (DESCRIPTION = (ADDRESS = (PROTOCOL= TCP)(Host= 10.177.13.57)(Port= 1526)) (CONNECT_DATA = (SID = ITS)))
OK (330 msec)
The arch process on the primary database might hang due to a bug below so that it couldn't ship the missing archive log files to the standby database. BUG 6113783 ARC PROCESSES CAN HANG INDEFINITELY ON NETWORK [ Not published so not viewable in My Oracle Support ] Fixed 11.2, 10.2.0.5 patchset We could work workaround the issue by killing the arch processes on the primary site and they will be respawned automatically immediately without harming the primary database.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
[maclean@rh2 ~]$ 
ps 
-ef|
grep 
arc
maclean   8231     1  0 22:24 ?        00:00:00 ora_arc0_PROD
maclean   8233     1  0 22:24 ?        00:00:00 ora_arc1_PROD
maclean   8350  8167  0 22:24 pts
/0    
00:00:00 
grep 
arc
[maclean@rh2 ~]$ 
kill 
-9 8231 8233
[maclean@rh2 ~]$ 
ps 
-ef|
grep 
arc
maclean   8389     1  0 22:25 ?        00:00:00 ora_arc0_PROD
maclean   8391     1  1 22:25 ?        00:00:00 ora_arc1_PROD
maclean   8393  8167  0 22:25 pts
/0    
00:00:00 
grep 
arc
 
and alert log will have:
 
Fri Jul 30 22:25:27 EDT 2010
ARCH: Detected ARCH process failure
ARCH: Detected ARCH process failure
ARCH: STARTING ARCH PROCESSES
ARC0 started with pid=26, OS 
id
=8389
Fri Jul 30 22:25:27 EDT 2010
ARC0: Archival started
ARC1: Archival started
ARCH: STARTING ARCH PROCESSES COMPLETE
ARC1 started with pid=27, OS 
id
=8391
Fri Jul 30 22:25:27 EDT 2010
ARC0: Becoming the 
'no FAL' 
ARCH
ARC0: Becoming the 
'no SRL' 
ARCH
Fri Jul 30 22:25:27 EDT 2010
ARC1: Becoming the heartbeat ARCH
Actually if we don't kill some fatal process in 10g , oracle will respawn all nonfatal processes. For example:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
[maclean@rh2 ~]$ 
ps 
-ef|
grep 
ora_|
grep 
-
v 
grep
maclean  14264     1  0 23:16 ?        00:00:00 ora_pmon_PROD
maclean  14266     1  0 23:16 ?        00:00:00 ora_psp0_PROD
maclean  14268     1  0 23:16 ?        00:00:00 ora_mman_PROD
maclean  14270     1  0 23:16 ?        00:00:00 ora_dbw0_PROD
maclean  14272     1  0 23:16 ?        00:00:00 ora_lgwr_PROD
maclean  14274     1  0 23:16 ?        00:00:00 ora_ckpt_PROD
maclean  14276     1  0 23:16 ?        00:00:00 ora_smon_PROD
maclean  14278     1  0 23:16 ?        00:00:00 ora_reco_PROD
maclean  14338     1  0 23:16 ?        00:00:00 ora_arc0_PROD
maclean  14340     1  0 23:16 ?        00:00:00 ora_arc1_PROD
maclean  14452     1  0 23:17 ?        00:00:00 ora_s000_PROD
maclean  14454     1  0 23:17 ?        00:00:00 ora_d000_PROD
maclean  14456     1  0 23:17 ?        00:00:00 ora_cjq0_PROD
maclean  14458     1  0 23:17 ?        00:00:00 ora_qmnc_PROD
maclean  14460     1  0 23:17 ?        00:00:00 ora_mmon_PROD
maclean  14462     1  0 23:17 ?        00:00:00 ora_mmnl_PROD
maclean  14467     1  0 23:17 ?        00:00:00 ora_q000_PROD
maclean  14568     1  0 23:18 ?        00:00:00 ora_q001_PROD
 
[maclean@rh2 ~]$ 
ps 
-ef|
grep 
ora_|
grep 
-
v 
pmon|
grep 
-
v 
ckpt |
grep 
-
v 
lgwr|
grep 
-
v 
smon|
grep 
-
v 
grep
|
grep 
-
v 
dbw|
grep 
-
v 
psp|
grep 
-
v 
mman |
grep 
-
v 
rec|
awk 
'{print $2}'
|
xargs 
kill 
-9
 
and alert log will have:
Fri Jul 30 23:20:58 EDT 2010
ARCH: Detected ARCH process failure
ARCH: Detected ARCH process failure
ARCH: STARTING ARCH PROCESSES
ARC0 started with pid=20, OS 
id
=14959
Fri Jul 30 23:20:58 EDT 2010
ARC0: Archival started
ARC1: Archival started
ARCH: STARTING ARCH PROCESSES COMPLETE
Fri Jul 30 23:20:58 EDT 2010
ARC0: Becoming the 
'no FAL' 
ARCH
ARC0: Becoming the 
'no SRL' 
ARCH
ARC1 started with pid=21, OS 
id
=14961
ARC1: Becoming the heartbeat ARCH
Fri Jul 30 23:21:29 EDT 2010
found dead shared server 
'S000'
, pid = (10, 3)
found dead dispatcher 
'D000'
, pid = (11, 3)
Fri Jul 30 23:22:29 EDT 2010
Restarting dead background process CJQ0
Restarting dead background process QMNC
CJQ0 started with pid=12, OS 
id
=15124
Fri Jul 30 23:22:29 EDT 2010
Restarting dead background process MMON
QMNC started with pid=13, OS 
id
=15126
Fri Jul 30 23:22:29 EDT 2010
Restarting dead background process MMNL
MMON started with pid=14, OS 
id
=15128
MMNL started with pid=16, OS 
id
=15132
 
That's all right!