Continuent Developer Community
Welcome, Guest
Please Login or Register.    Lost Password?
Master hangs at synchronization (1 viewing) (1) Guest
Go to bottom Post Reply Favoured: 0
TOPIC: Master hangs at synchronization
#271
Ulf Månsson (User)
Fresh Boarder
Posts: 16
graphgraph
User Offline Click here to see the profile of this user
Master hangs at synchronization 1 Year, 2 Months ago Karma: 0  
Hello,

I have got a problem with my trep master that has restarted by itself and then been in sync state for hours. Trepctl status gives

Name Value
===============================================
System ID ma1001_3306
System Version: 1.0-beta5
System State: SYNCHRONIZING
System Uptime (S): 6464.646s
State Uptime (S): 6047.901s
Error: null
Error Exception: null
Min Seq No: 0
Max Seq No: 13440993
Monitor Intvl (S): 6464.618
Extr Total: 0
Extr Last Seq No: -1
Extr/Sec: 0.0
Recv Total: 0
Recv Last Seq No: -1
Recv Source TS: null
Recv Target TS: null
Recv Latency (S): 0.0
Recv/Sec: 0.0
Apply Total: 0
Apply Last Seq No: -1
Apply Source TS: null
Apply Target TS: null
Apply Latency (S): 0.0
Apply/Sec: 0.0
State: SYNCHRONIZING
Seqno Range: 0 -> 13440993
mansson@sql_ma1001:/opt/tungsten-replicator/bin$

The trepsvc.log gives this

ERROR | wrapper | 2009/07/01 20:00:29 | JVM appears hung: Timed out waiting for signal from JVM.
ERROR | wrapper | 2009/07/01 20:00:30 | JVM did not exit on request, terminated
STATUS | wrapper | 2009/07/01 20:00:31 | JVM received a signal SIGKILL (9).
STATUS | wrapper | 2009/07/01 20:00:35 | Launching a JVM...
INFO | jvm 2 | 2009/07/01 20:00:43 | WrapperManager: Initializing...

and from trep.log

2009-07-01 19:59:57,411 DEBUG extractor.mysql.MySQLExtractor Query extracted: insert into entity_instance (start_time, stop_time, interval_precision, interval_type, entity, document,
fragment, entity_fragment_tag, time_fragment_tag, entity_rank) values ('2009-07-01 18:00:00', '2009-07-01 18:00:00', 1, 0, 720944, 1874519, 2644715, 5352691, null, 0)
2009-07-01 19:59:57,411 DEBUG extractor.mysql.MySQLExtractor Query extracted: insert into entity_instance (start_time, stop_time, interval_precision, interval_type, entity, document,
fragment, entity_fragment_tag, time_fragment_tag, entity_rank) values ('2009-07-01 18:00:00', '2009-07-01 18:00:00', 1, 0, 720944, 1874519, 2644715, 5352691, null, 0)
2009-07-01 19:59:57,411 DEBUG extractor.mysql.MySQLExtractor extracting from pos, file: mysql3306-bin.000797 pos: 65894318
2009-07-01 19:59:57,411 DEBUG extractor.mysql.MySQLExtractor extracting from pos, file: mysql3306-bin.000797 pos: 65894318
2009-07-01 19:59:57,411 DEBUG extractor.mysql.Log_event log_pos: 65894345
2009-07-01 19:59:57,411 DEBUG extractor.mysql.Log_event log_pos: 65894345
2009-07-01 19:59:57,412 DEBUG extractor.mysql.MySQLExtractor Commit extracted: 100204033
2009-07-01 20:00:45,128 INFO tungsten.replicator.ReplicatorManager Starting Replicatior Manager
2009-07-01 20:00:45,128 INFO tungsten.replicator.ReplicatorManager Starting Replicatior Manager
2009-07-01 20:00:45,198 INFO tungsten.replicator.EventDispatcher Starting event dispatcher
2009-07-01 20:00:45,198 INFO tungsten.replicator.EventDispatcher Starting event dispatcher
2009-07-01 20:00:45,203 INFO replicator.conf.PropertiesManager Reading static properties file: /opt/tungsten-replicator/bin/../conf/replicator.properties
2009-07-01 20:00:45,203 INFO replicator.conf.PropertiesManager Reading static properties file: /opt/tungsten-replicator/bin/../conf/replicator.properties
2009-07-01 20:00:45,215 INFO commons.jmx.JmxManager Starting RMI registry on registryPort: 10000
2009-07-01 20:00:45,669 DEBUG tungsten.replicator.ReplicatorManager ReplEvent: StartEvent
2009-07-01 20:00:45,669 DEBUG tungsten.replicator.ReplicatorManager ReplEvent: StartEvent
2009-07-01 20:00:45,669 INFO replicator.conf.PropertiesManager Reading static properties file: /opt/tungsten-replicator/bin/../conf/replicator.properties
2009-07-01 20:00:45,669 INFO replicator.conf.PropertiesManager Reading static properties file: /opt/tungsten-replicator/bin/../conf/replicator.properties
2009-07-01 20:00:45,674 INFO tungsten.replicator.ReplicatorManager Sent State Change Notification START -> OFFLINE:NORMAL
2009-07-01 20:00:45,674 INFO tungsten.replicator.ReplicatorManager Sent State Change Notification START -> OFFLINE:NORMAL
2009-07-01 20:00:45,675 DEBUG tungsten.replicator.ReplicatorManager Applied event: StartEvent
2009-07-01 20:00:45,675 DEBUG tungsten.replicator.ReplicatorManager Applied event: StartEvent
2009-07-01 20:00:45,675 INFO tungsten.replicator.ReplicatorManager Replicator auto-enabling is engaged; going online automatically
2009-07-01 20:00:45,675 INFO tungsten.replicator.ReplicatorManager Replicator auto-enabling is engaged; going online automatically
2009-07-01 20:00:45,676 DEBUG tungsten.replicator.ReplicatorManager ReplEvent: GoOnlineEvent
2009-07-01 20:00:45,676 DEBUG tungsten.replicator.ReplicatorManager ReplEvent: GoOnlineEvent
2009-07-01 20:00:45,711 INFO replicator.conf.ReplicatorRuntime Replicator role: master
2009-07-01 20:00:45,711 INFO replicator.conf.ReplicatorRuntime Replicator role: master
2009-07-01 20:00:45,711 INFO replicator.conf.ReplicatorRuntime Assigning default global property value: key=replicator.oos_policy default value=Retry
2009-07-01 20:00:45,711 INFO replicator.conf.ReplicatorRuntime Assigning default global property value: key=replicator.oos_policy default value=Retry
2009-07-01 20:00:45,768 INFO replicator.conf.ReplicatorRuntime Setting consistencyFailureStop to true
2009-07-01 20:00:45,768 INFO replicator.conf.ReplicatorRuntime Setting consistencyFailureStop to true
2009-07-01 20:00:45,768 INFO replicator.conf.ReplicatorRuntime Setting consistencyCheckColumnNames to true
2009-07-01 20:00:45,768 INFO replicator.conf.ReplicatorRuntime Setting consistencyCheckColumnNames to true
2009-07-01 20:00:45,768 INFO replicator.conf.ReplicatorRuntime Setting consistencyCheckColumnTypes to true
2009-07-01 20:00:45,768 INFO replicator.conf.ReplicatorRuntime Setting consistencyCheckColumnTypes to true
2009-07-01 20:00:45,768 INFO replicator.conf.ReplicatorRuntime Assigning default global property value: key=replicator.thl.reset_period default value=1
2009-07-01 20:00:45,768 INFO replicator.conf.ReplicatorRuntime Assigning default global property value: key=replicator.thl.reset_period default value=1
2009-07-01 20:00:45,768 INFO replicator.conf.ReplicatorRuntime Assigning default global property value: key=replicator.thl.cache_size default value=0
2009-07-01 20:00:45,768 INFO replicator.conf.ReplicatorRuntime Assigning default global property value: key=replicator.thl.cache_size default value=0
2009-07-01 20:00:45,768 INFO replicator.conf.ReplicatorRuntime Assigning default global property value: key=replicator.thl.applier_block_commit_size default value=0
2009-07-01 20:00:45,768 INFO replicator.conf.ReplicatorRuntime Assigning default global property value: key=replicator.thl.applier_block_commit_size default value=0
2009-07-01 20:00:45,768 INFO replicator.conf.ReplicatorRuntime Assigning default global property value: key=replicator.thl.driver default value=null
2009-07-01 20:00:45,768 INFO replicator.conf.ReplicatorRuntime Assigning default global property value: key=replicator.thl.driver default value=null

Any clue what is going on?

BR

Ulf Mansson
 
Report to moderator   Logged Logged  
  The administrator has disabled public write access.
#272
Ulf Månsson (User)
Fresh Boarder
Posts: 16
graphgraph
User Offline Click here to see the profile of this user
Re:Master hangs at synchronization 1 Year, 2 Months ago Karma: 0  
Maybe the answer why it not get into online mode is (from Innotop):
Cmd ID State User Host DB Time Query

Query 6029 Sending data root localhost tungsten 01:52:49 SELECT MAX(seqno) FROM tungsten.history WHERE status='2' OR status='5'

My history table has 13 millions records so maybe the problem is that I need to purge it
 
Report to moderator   Logged Logged  
 
Last Edit: 2009/07/01 18:15 By ulfskrapmail@gmail.com.
  The administrator has disabled public write access.
#274
Robert Hodges (Moderator)
Moderator
Posts: 218
graph
User Offline Click here to see the profile of this user
Location: Berkeley California
Re:Master hangs at synchronization 1 Year, 2 Months ago Karma: 1  
Hi Ulf,

This is a bug--we are scanning to find the last history request. I suspect this query can be significantly optimized by using the seqno index properly. I have logged the problem as:
http://forge.continuent.org/jira/browse/TREP-316

For now I recommend that you purge the history table regularly using the 'thl purge' command. We will fix this problem within the next two weeks; look for a fix in build 1.0.3.

Thanks for your help in diagnosing the problem!

Cheers, Robert
 
Report to moderator   Logged Logged  
 
Robert Hodges
Continuent CTO
  The administrator has disabled public write access.
#759
Cody Payne (User)
Fresh Boarder
Posts: 18
graphgraph
User Offline Click here to see the profile of this user
Re:Master hangs at synchronization 1 Month, 2 Weeks ago Karma: 0  
We are seeing this one today after doing a batch load of 180 million rows in to our DB pretty flat table.

ERROR | wrapper | 2010/07/20 15:55:21 | JVM appears hung: Timed out waiting for signal from JVM.
ERROR | wrapper | 2010/07/20 15:55:21 | JVM did not exit on request, terminated
STATUS | wrapper | 2010/07/20 15:55:22 | JVM received a signal SIGKILL (9).
STATUS | wrapper | 2010/07/20 15:55:26 | Launching a JVM...

Using latest RC 1.3-rc-2 Java JVM 1.5.22 and have purged all but the last 24 hours of the history table...
 
Report to moderator   Logged Logged  
 
Last Edit: 2010/07/20 16:09 By cpayne@bconnected.com.
  The administrator has disabled public write access.
Go to top Post Reply
Powered by FireBoardget the latest posts directly to your desktop