Friday, January 30, 2009

過年。味道

大年初五,該是迎財神、開工的日子。

總是覺得越是進步的現在,懷舊的氛圍越是濃厚,但是年味一年不如一年,卻是顯而易見。

或許,人們想的是過往的美好,而非是繁瑣的過去吧。

Wednesday, January 21, 2009

半夜不睡覺

前幾天,半夜起來 debugging。想記錄點什麼,不過同事的 mail 中報告得更詳細,就先擋著先吧。


--------------------------------------------------------------------------------
From: 亞當
Sent: Wednesday, January 21, 2009 5:25 PM
To: 來恩; 費迪南
Cc: 亞當; 雞米; 大衛
Subject: globalcm upgrade from 3.5 to 5.0 process


Dear All

The following is what we did during the 1/19 and 1/20

1. 2009/1/19 around 10pm
來恩 gave me a call saying that the upgrade process failed for some reasons after we try to modify TMCM's schema.xml (since there are 3 columns missing from the DB). The upgrade process is not able to run some store procedure. Then 來恩 asked me to join his WebEx

2. From IIS settings I found that there was SSL port missing on the configuration, therefore I set it to port 443 then reinstalled the TMCM 5.0. The reinstall process went well until it reached the set folder permission (WebUI).

3. 來恩 then asked Ferdy's help to compare the good TMCM's folder setting with the globalcm machine. During the process, I was not able to monitor what 來恩 was modifying.

4. After the folder permissions had been set, the reinstall process didn't get finished and stucked at the end (setup.exe). I then checked the ccgi_install.log and noticed that ccgi was not reinstalled properly. I manually removed CCGI componenet and installed CCGI by silent install script.

5. We then run the reinstall TMCM again, and finally we saw the completion screen of TMCM installation. After trying to logon to the TMCM console, it displayed the CCGI 400 error.

6. I just did a quick check of all TMCM 3 services and 8 exe files which are all intact. At this point in time, I really had no idea why CCGI didn't work. I also checked all registry hive related to the CCGI and found one which couldn't be opened or deleted. I then called 雞米's (another QA) help around (3:00am). He then joined the WebEx session as well and took a close look the services and exes. He suggested that we could restart the OS. After rebooting the machine, that locked registry entry was gone. Then reinstalling the TMCM, we had reached the final stage of TMCM installation completion. However, we still had CCGI problems (cannot reach apphost). 雞米 also found that one of process was keep doing this behavior: dying and starting until it went timeout stage (MsgReceiever.exe). I also noticed that when MsgReceiever.exe die, Dr. Watson created dump.

7. In the morning, 雞米 and I went to office and tried to collect all debug logs and core dump and sent them to RD to analyze them.

8. Afternoon, RD (伊森) was analyzing the core dump. QA (亞當 and 雞米) were trying to install TMCM 5.0 by backing up the original DB and all product profiles. Around 1/20/2009 3:15pm, RD found the possible root cause from the core dump: since we had TMIC registered as one of the TMCM entity. TMIC registered a plugin to TMCM's MsgReceiever.exe (configured in InterceptorHandler.xml). I then backuped the file "InterceptorHandler.xml" to "InterceptorHandler-Backup.xml". And the plug-in "TMICPlugin4CM35.dll" is for TMCM version 3.5 not for 5.0, TMCM couldn't recognize 3.5 plug-in, TMIC team needs to provide a 5.0 build to be able to register to TMCM.

9. In the meantime, the original DB was backuped and so did the product profiles. I remarked the plugin name from the "InterceptorHandler.xml" and restarted the SQL server then all 3 TMCM services. Aafter several logon to TMCM console there was no CCGI error but waited a while, and re-logon to the console, CCGI error appeared again.
=> It seemed that MsgReciever.exe was still having the problem. I then turned on the debug log and collected MsgReceiever related logs and core dump for RD to alalyze the log.
RD (伊森) analyzed the log and core dump, and found the root cause.
=> If the templog is big enough, TMCM will write it to the file and then do the parsing, if at this moment debuglog is turned on and is writing to the debug log file, MsgReceiever will crash.
=> We just found a bug (I will fire a track for it along with the TMIC handler hooked to the MsgReceiever problem when upgrade 3.5 to 5.0)


In the meantime (around 5:00pm), 雞米 and I were backing up the old DB and started to restore the TMCM back online.

1. When we installed the TMCM 5.0, we choosed the re-use DB option in order to use the old DB, for some reasons the installer did not recognize the old installation cmkeybackup folder. The installer always came back to the DB selection. We could not continue to re-use the DB.

2. We then needed to use the really-really manual way to restore the DB back.
=> Install brand new TMCM 5.0, detach the DB, attach the old DB, restore the old profiles, and then merge all necessary xml files for use.

(Around 9:00pm) restart TMCM services and logon to TMCM console => no CCGI error


cheers

Sunday, January 18, 2009

今天。天氣晴

這兩天,天氣真好。在著實冷了一整個禮拜後,露臉的太陽很是叫人開心,雖然吹到風還是冷,不過曬著太陽吹吹風,冷得很舒服。

打從去年九月返家之後,絲毫不曾來新增點東西。轉眼間,過了耶誕、元旦,農曆年也將來到,好快。生活又是一個轉折與開始。