Product |
Oracle Server – Enterprise Edition |
|
|
Product Version |
10.2.0.4 |
Operating System |
Oracle Solaris on SPARC (64-bit) |
OS Version |
10 |
|
|
|
|
Problem Description |
Data Changes not apply on destination site
$V$STREAMS_CAPTURE==>PAUSED FOR FLOW CONTROLCOLUMN queue_schema FORMAT A12
COLUMN queue_name FORMAT A20
SELECT queue_schema, queue_name, unbrowsed_msgs, overspilled_msgs, memory_usage, publisher_state
FROM V$BUFFERED_PUBLISHERS;QUEUE_SCHEMA QUEUE_NAME UNBROWSED_MSGS OVERSPILLED_MSGS MEMORY_USAGE PUBLISHER_STATE
———————————————————–
STRMADMIN CAPTURE_SRC 14060 0 80 IN FLOW CONTROL: TOO MANY UNBROWSED MESSAGES
STRMADMIN APPLY_SRC 3682 0 80 PUBLISHING MESSAGES
STRMADMIN APPLY_SRC 0 0 80 PUBLISHING MESSAGES |
This is bidirectional streams
We had two Production database sites and both the sites had 2-Node Oracle10g RAC
We had Configured the Oracle Streaming between two sites.
Oracle Streaming working fine since more then 8 months.
Since today early morning we faces the problem.
Data changes activity only perform on primary site and right not it’s going on
table data is not replicating on destination sites due to our business continue loss also more customer escalations
We have provided the Streaming Health checkup report both the sites.
streams_hc_10GR2_orcld_05092012.html
streams_hc_10GR2_orclp_05092012.html
How to resolved the issue?.
Issue Clarification-1
Data Changes not apply on destination site
$V$STREAMS_CAPTURE==>PAUSED FOR FLOW CONTROL
Report Analysis-1
++ OPEN STREAMS CAPTURE TRANSACTIONS ++ (Source)
+** Count **+
Streams Name |
Open Transactions |
Total LCRs |
CAPTURES_SRC |
1 |
8573662 |
+** Detail **+
INST_ID |
Streams Name |
Streams Ty |
XIDUSN |
XIDSLT |
XIDSQN |
CUMULATIVE_MESSAGE_COUNT |
TOTAL_MESSAGE_COUNT |
FIRST_MESSAGE_TIM |
FIRST_MESSAGE_NUMBER |
LAST_MESSAGE_TIME |
LAST_MESSAGE_NUMBER |
1 |
CAPTURES_SRC |
CAPTURE |
76 |
0 |
79680 |
8573662 |
0 |
22:02:18 09/04/12 |
8.2413E+12 |
00:10:09 09/05/12 |
8.2414E+12 |
++ APPLY SPILLED TRANSACTIONS ++ (Destination)
Apply Name |
Transaction
ID |
SCN of First
Message in Txn |
First Message
Creation Time |
Count of
Messages in Txn |
Spill
Creation Time |
APPLYS_DEST |
76.0.79680 |
8.2413E+12 |
22:02:18 09/04/12 |
2513933 |
22:03:37 09/04/12 |
++ APPLY Reader Statistics ++ (Destination)
Apply Name |
Captured or
User-Enqueued L |
Process |
STATE |
Total Messages
Dequeued |
Total Messages
Spilled |
SGA Used |
Oldest
SCN |
Oldest
Transaction |
APPLYS_DEST |
Captured LCRS |
P000 |
DEQUEUE MESSAGES |
33600091 |
4232671 |
0 |
8.2413E+12 |
76.0.79680 |
Apply Name |
Apply Reader
Latency
(Seconds) |
Dequeued Messa
Creation
Timestamp |
Dequeue
Timestamp |
Last
Dequeued Message
Number |
Last
Browsed Message
Number |
APPLYS_DEST |
48743 |
00:13:51 09/05 |
13:46:14 09/05 |
8.2414E+12 |
0 |
Apply Name |
Total Messages
Dequeued |
Total Messages
Spilled |
Elapsed Time
Dequeue
(centisecs) |
Elapsed Time
Schedule
(centisecs) |
Elapsed Time
Spill
(centisecs) |
APPLYS_DEST |
33600106 |
4232671 |
472603319 |
5325073 |
746028 |
++ OPEN STREAMS CAPTURE TRANSACTIONS ++
+** Count **+
Streams Name |
Open Transactions |
Total LCRs |
CAPTURES_DEST |
1 |
0 |
+** Detail **+
INST_ID |
Streams Name |
Streams Ty |
XIDUSN |
XIDSLT |
XIDSQN |
CUMULATIVE_MESSAGE_COUNT |
TOTAL_MESSAGE_COUNT |
FIRST_MESSAGE_TIM |
FIRST_MESSAGE_NUMBER |
LAST_MESSAGE_TIME |
LAST_MESSAGE_NUMBER |
1 |
CAPTURES_DEST |
CAPTURE |
10 |
22 |
4184725 |
0 |
0 |
13:46:08 09/05/12 |
8.2422E+12 |
|
0 |
+ MESSAGES IN BUFFER QUEUE ++ (Destination)
Queue
Name |
STARTUP_TIME |
Current Number of
Outstanding
Messages
in Queue |
Current Number of
Spilled
Messages
in Queue |
Cumulative
Number
of Messages
in Queue |
Cumulative Number
of Spilled
Messages
in Queue |
Current Number of
Expired
Messages
in Queue |
STRMADMIN.CAPTURE_DEST |
02:59:05 05/16/12 |
0 |
0 |
3682 |
13 |
0 |
STRMADMIN.APPLY_DEST |
02:56:39 05/16/12 |
4369 |
0 |
33604301 |
1549 |
0 |
SYS.KUPC$S_1_20120516020419 |
02:04:22 05/16/12 |
0 |
0 |
16 |
0 |
0 |
1. There an open capture and apply transactions with large number of LCR’s.
2. Also there are large amount of spilled messages in the apply queue.
The above indicates that the apply process is not working fast enough, or there are huge transaction caused the apply process to slow down until finish it, and as a result, the capture process has been paused to control the flow of the messages from source to target.
Action Plan-1
To verify the above conclusion, please generate another set of health check reports from both sides, I will compare the statistics numbers between the old and new sets and let you know the results.. If my conclusion is correct, we will have only two options : Wait for the large transaction to finish, or instruct the capture and apply to ignore it (not recommended for data consistency) ..
Please generate and provide another two new health check reports from the two databases and provide them.
please find the required streaming hc reports
streams_hc_10GR2_orcld_05092012_2.html
streams_hc_10GR2_orclp_05092012_2.html
Report Analysis-2
++ APPLY Reader Statistics ++ (Destination)
Apply Name |
Captured or
User-Enqueued L |
Process |
STATE |
Total Messages
Dequeued |
Total Messages
Spilled |
SGA Used |
Oldest
SCN |
Oldest
Transaction |
APPLYS_DEST |
Captured LCRS |
P000 |
DEQUEUE MESSAGES |
33895603 |
4232671 |
0 |
8.2413E+12 |
76.0.79680 |
Apply Name |
Apply Reader
Latency
(Seconds) |
Dequeued Messa
Creation
Timestamp |
Dequeue
Timestamp |
Last
Dequeued Message
Number |
Last
Browsed Message
Number |
APPLYS_DEST |
54081 |
00:15:44 09/05 |
15:17:05 09/05 |
8.2414E+12 |
0 |
Apply Name |
Total Messages
Dequeued |
Total Messages
Spilled |
Elapsed Time
Dequeue
(centisecs) |
Elapsed Time
Schedule
(centisecs) |
Elapsed Time
Spill
(centisecs) |
APPLYS_DEST |
33895609 |
4232671 |
472610296 |
5866722 |
746028 |
++ OPEN STREAMS CAPTURE TRANSACTIONS ++ (Source)
+** Count **+
Streams Name |
Open Transactions |
Total LCRs |
CAPTURES_SRC |
1 |
918398 |
+** Detail **+
INST_ID |
Streams Name |
Streams Ty |
XIDUSN |
XIDSLT |
XIDSQN |
CUMULATIVE_MESSAGE_COUNT |
TOTAL_MESSAGE_COUNT |
FIRST_MESSAGE_TIM |
FIRST_MESSAGE_NUMBER |
LAST_MESSAGE_TIME |
LAST_MESSAGE_NUMBER |
1 |
CAPTURES_SRC |
CAPTURE |
76 |
0 |
79680 |
918398 |
0 |
|
8.2414E+12 |
00:15:50 09/05/12 |
8.2414E+12 |
++ MESSAGES IN BUFFER QUEUE ++ (Destination)
Queue
Name |
STARTUP_TIME |
Current Number of
Outstanding
Messages
in Queue |
Current Number of
Spilled
Messages
in Queue |
Cumulative
Number
of Messages
in Queue |
Cumulative Number
of Spilled
Messages
in Queue |
Current Number of
Expired
Messages
in Queue |
STRMADMIN.CAPTURE_DEST |
02:59:05 05/16/12 |
0 |
0 |
3682 |
13 |
0 |
STRMADMIN.APPLY_DEST |
02:56:39 05/16/12 |
4634 |
0 |
33900201 |
1549 |
0 |
SYS.KUPC$S_1_20120516020419 |
02:04:22 05/16/12 |
0 |
0 |
16 |
0 |
0 |
Issue Clarification-2
I see the capture status has been changed from “PAUSED FOR FLOW CONTROL” to “CAPTURING CHANGES” .. This mean that the capture process is working now. However, it might take some to time to catch up as the large transaction I have identified before is still open :
Apply reader total messages dequeued : Old value : 33600091 new value : 33895603 (apply is working good) .
OPEN STREAMS CAPTURE TRANSACTIONS :
XID: 76.0.79680 , number of LCR’s : Old value : 8573662 , new value : 918398
As you see the total number of LCR’s of this transaction has been changed and the new value is less than the old value, this is indicator that capture is capturing this transaction from beginning, another evidence is the following :
Capture name : CAPTURES_SRC, session ID: old value: 1031 , new value : 938
Last status timestamp: Old : 10:51:45 09/05/12 , new : 15:15:06 09/05/12
So it seems that the capture process has been restarted with new session ID, not sure if that was done automatically or if you have restarted it, anyway, I will recommend you to monitor the capture for couple of hours, I think it will catch up after some time as its not paused anymore.. Please monitor the system for couple of hours, and let me know if the new changes are being replicated or not, if there are still a problem then, please generate new health check reports and send them.. the action in this case might be to ignore that transaction..
Reply from client
SQL> SELECT CAPTURE_NAME,QUEUE_NAME,STATUS,ERROR_NUMBER,ERROR_MESSAGE FROM DBA_CAPTURE;
CAPTURE_NAME QUEUE_NAME STATUS ERROR_NUMBER ERROR_MESSAGE
——————————————————————————–
CAPTURES_SRC CAPTURE_SRC ENABLED
SQL> SELECT CAPTURE_NAME,STATE FROM V$STREAMS_CAPTURE;
CAPTURE_NAME STATE
——————————————————————————–
CAPTURES_SRC PAUSED FOR FLOW CONTROL
SQL> SELECT PROPAGATION_NAME,STATUS,ERROR_MESSAGE,ERROR_DATE FROM DBA_PROPAGATION;
PROPAGATION_NAME STATUS ERROR_MESSAGE ERROR_DAT
——————————————————-
PROP_SRC_TO_DEST ENABLED
SQL> SELECT APPLY_NAME,STATUS,ERROR_NUMBER,ERROR_MESSAGE FROM DBA_APPLY;
APPLY_NAME STATUS ERROR_NUMBER ERROR_MESSAGE
——————————————————————————–
APPLYS_SRC ENABLED
We want the fast resolution with work around as our business services loosing continue
What the proc and cons of ignore that transaction.
Can we know that which user-schema/sql statement doing so long changes in table?
C001: long running txn detected, xid: 0x004c.000.00013740==>This is the logs of primary site
If we ignore that transaction then subsequent transaction apply or not?
Can we know the sql statement of that long transaction what you suggest for ignore?
Why there are lots of log switch happen but not shown log miner logs in alert logs?
Action Plan-2
1. To know the sql of the transaction, you can use the logminer utility with ad-hoc option, please use the instructions in the following article to do that :
311620.1 How to Use LogMiner to Determine Problem SQL in a Streams Environment
2. Yes, if you ignore that transaction, and other transactions depends on it might cause a an apply errors.. We can’t decide that unless you know the sql commands of that transaction, but for example it might be any update statement affects only one table, in this case, it will affect only the subsequent transactions to that specific table only. But even with that, it might be the only solution.
If you decide to ignore that transaction, the following is the required steps:
Ignore/skip the transaction on the capture and apply process –
dbms_capture_adm.set_parameter(‘CAPTURE_NAME’,’_ignore_transaction’,’TXN_ID’);
dbms_apply_adm.set_parameter(‘APPLY_NAME’,’_ignore_transaction’,’TXN_ID’);
At the destination site:
Please stop apply.
exec dbms_apply_adm.stop_apply(‘APPLY_NAME_HERE’);
— Then set the parameter to ignore that transaction
execute dbms_apply_adm.set_parameter
(‘apply_name_here’,’_ignore_transaction’,’5.87.14127′);
— Now restart apply
exec dbms_apply_adm.start_apply(‘APPLY_NAME_HERE’);
Do the same for your capture process and it also will need to be restarted.
Please let me know if you have any further questions..
Reply from client
Can we get info on process ? I mean how many transaction pending and how long it will take ?
Action Plan-3
It will not be easy as the transaction is still open on the capture side, meaning that the capture will need to mine more logs for the same transaction after it resumes (its currently paused for flow control) .. You can generate another set of health check reports and I will let you know the progress till now. However, please DON”T restart the capture under any condition, every time you will restart it, it will mine this transaction from scratch, and I think its already happened before.
please find the required reports
streams_hc_10GR2_orcld_05092012_3.html
streams_hc_10GR2_orclp_05092012_3.html
Report Analysis-3
++ APPLY Reader Statistics ++(Destination)
Apply Name |
Captured or
User-Enqueued L |
Process |
STATE |
Total Messages
Dequeued |
Total Messages
Spilled |
SGA Used |
Oldest
SCN |
Oldest
Transaction |
APPLYS_DEST |
Captured LCRS |
P000 |
DEQUEUE MESSAGES |
34569811 |
4232671 |
0 |
8.2413E+12 |
76.0.79680 |
Apply Name |
Apply Reader
Latency
(Seconds) |
Dequeued Messa
Creation
Timestamp |
Dequeue
Timestamp |
Last
Dequeued Message
Number |
Last
Browsed Message
Number |
APPLYS_DEST |
67009 |
00:20:05 09/05 |
18:56:54 09/05 |
8.2414E+12 |
0 |
Apply Name |
Total Messages
Dequeued |
Total Messages
Spilled |
Elapsed Time
Dequeue
(centisecs) |
Elapsed Time
Schedule
(centisecs) |
Elapsed Time
Spill
(centisecs) |
APPLYS_DEST |
34569814 |
4232671 |
472626432 |
7177295 |
746028 |
++ OPEN STREAMS CAPTURE TRANSACTIONS ++ (Source)
+** Count **+
Streams Name |
Open Transactions |
Total LCRs |
CAPTURES_SRC |
1 |
1590994 |
+** Detail **+
INST_ID |
Streams Name |
Streams Ty |
XIDUSN |
XIDSLT |
XIDSQN |
CUMULATIVE_MESSAGE_COUNT |
TOTAL_MESSAGE_COUNT |
FIRST_MESSAGE_TIM |
FIRST_MESSAGE_NUMBER |
LAST_MESSAGE_TIME |
LAST_MESSAGE_NUMBER |
1 |
CAPTURES_SRC |
CAPTURE |
76 |
0 |
79680 |
1590994 |
0 |
|
8.2414E+12 |
00:20:13 09/05/12 |
8.2414E+12 |
++ APPLY SPILLED TRANSACTIONS ++ (Destination)
Apply Name |
Transaction
ID |
SCN of First
Message in Txn |
First Message
Creation Time |
Count of
Messages in Txn |
Spill
Creation Time |
APPLYS_DEST |
76.0.79680 |
8.2413E+12 |
22:02:18 09/04/12 |
2190807 |
22:03:37 09/04/12 |
++ MESSAGES IN BUFFER QUEUE ++ (Destination)
Queue
Name |
STARTUP_TIME |
Current Number of
Outstanding
Messages
in Queue |
Current Number of
Spilled
Messages
in Queue |
Cumulative
Number
of Messages
in Queue |
Cumulative Number
of Spilled
Messages
in Queue |
Current Number of
Expired
Messages
in Queue |
STRMADMIN.CAPTURE_DEST |
02:59:05 05/16/12 |
0 |
0 |
3690 |
13 |
0 |
STRMADMIN.APPLY_DEST |
02:56:39 05/16/12 |
4323 |
0 |
34574101 |
1549 |
0 |
SYS.KUPC$S_1_20120516020419 |
02:04:22 05/16/12 |
0 |
0 |
16 |
0 |
0 |
Issue Clarification-3
++ APPLY Reader Statistics ++
Total Messages
Dequeued
Report1 : 33600091
Report2 : 33895603
Report3 : 34569811
++ MESSAGES IN BUFFER QUEUE ++
Outstanding messages: Queue name : STRMADMIN.APPLY_DEST
Report1: 4369
Report2: 4634
Report3: 4323
Cumulative number of messages:
Report1: 33604301
Report2: 33900201
Report3: 34574101
RCA
XID 76.0.79680
On target(apply) : number of LCR’s : 2190807
On Source(capture) : Number of LCR’s: 1590994
This is normal because the capture process has been restarted at some point of time, at that point the apply has received around 2 millions of the messages in that transaction.. But when the capture has been restarted , it has started mining the same transaction from beginning, so the apply is waiting till the same dequeued LCR (LCR number 2190807) is processed.
Anyway, I reviewed the oldest health check, which was generated before restarting the capture process, and the number of LCR’s according to that report was : 8573662 (more than 8 millions) ..
That means that there are AT LEAST 6 million LCR of the same transaction needs to be mined on the source.. which is too much.. I cant assume how much time it would take, but the following might be indicator :
The capture last started at : 12:16:47 09/05/12
The Capture current time stamp (according to the last report) : 18:54:17 09/05/12
That mean that the capture has processed around 1590994 (one million and half ) in about 4 hours.. if it has worked with the same frequency, that mean it will need around 2.75 hour for one million LCR.. that lead us to expect around 14-15 hours for the captur to process the 8 million LCR’s in the queue.. Remember that this is the capture time, the apply might be working with the same frequency (hopefully) or might be slower.
The results might not be accurate, and the frequency might be changed with time, but don’t expect a lot..
I know that these estimations is not lovely and you will not like them, but I have to put all the facts on the table so you can take the correct decision..
For example, if you decide to ignore transaction 76.0.79680 , that might lead to data inconsistency, yes.. that is possible.. But if this is case, how much time do you need to synchronize the data using EXP/IMP for example ? is this less than or more than the 15 hours estimated for the capture to catch up (given that this number is not finale and may be increased) ??
We cannot wait for the long transaction to finish as you know we are losing our business services.
So as you suggest in earlier response we can use oracle datapump and reconfigure the streaming solutions
In Our case expdp/impdp required less time as we had single table of 4gb data in it.
1. De-configure the Oracle Streaming at both the sites
2. Configure the Oracle Streaming using the expdp/impdp
Troubleshooting Commands:-
Oracle Streaming Health checkup scripts
streams_hc_10GR2.sql
prompt
============================================================================================
prompt
prompt ++ OPEN STREAMS CAPTURE TRANSACTIONS ++
prompt
prompt +** Count **+
select streams_name,count(*) "Open Transactions",sum(cumulative_message_count) "Total LCRs"from gv$streams_transaction where streams_type='CAPTURE' group by streams_name;
prompt
prompt +** Detail **+
select * from gv$streams_transaction where streams_type='CAPTURE' order by streams_name,first_message_number;
prompt Configuration: Queue
Capture Propagation Apply Database
prompt Analysis: History Rules Notifications Configuration Performance Wait Analysis
prompt Statistics Quick Link: Queue Capture Propagation Apply Errors
prompt =========================================================================================
============================================================================================
prompt
prompt ++ APPLY SPILLED TRANSACTIONS ++
col APPLY_NAME Head 'Apply Name'
col txn_id HEAD 'Transaction|ID'
col FIRST_SCN Head 'SCN of First| Message in Txn'
col MESSAGE_COUNT Head 'Count of |Messages in Txn'
col FIRST_MESSAGE_CREATE_TIME Head 'First Message|Creation Time'
col SPILL_CREATION_TIME Head ' Spill |Creation Time'
select Apply_name,
xidusn||'.'||xidslt||'.'||xidsqn txn_id,
first_scn,
first_message_create_time,
message_count,
spill_creation_time
from dba_apply_SPILL_TXN;
prompt
============================================================================================
prompt =========================================================================================
prompt
prompt ++ MESSAGES IN BUFFER QUEUE ++
col QUEUE format a50 wrap
col "Message Count" format 9999999999999999 heading 'Current Number of|Outstanding|Messages|in Queue'
col "Spilled Msgs" format 9999999999999999 heading 'Current Number of|Spilled|Messages|in Queue'
col "TOtal Messages" format 9999999999999999 heading 'Cumulative |Number| of Messages|in Queue'
col "Total Spilled Msgs" format 9999999999999999 heading 'Cumulative Number|of Spilled|Messages|inQueue'
col "Expired_Msgs" heading 'Current Number of|Expired|Messages|in Queue'
SELECT queue_schema||'.'||queue_name Queue, startup_time, num_msgs "Message Count", spill_msgs "Spilled Msgs",
cnum_msgs "Total Messages", cspill_msgs "Total Spilled Msgs", expired_msgs FROM gv$buffered_queues;
prompt Configuration: Queue
Capture Propagation Apply Database
prompt Analysis: History Rules Notifications Configuration Performance Wait Analysis
prompt
============================================================================================