This project is read-only.

RC3 upgrade to 1.0.0.0 problems

Jul 2, 2007 at 3:22 PM
I have been using RC3 to monitor aroubd 40 different machines and a handful of Catalyst switches, and when 1.0.0.0 was released, I decided to give it a go. We have not yet moved this to a production server, so the Executive service and SQLEXPRESS database were both on the same box (XP SP2). After the upgrade, I decided to reboot to see if the SQL startup dependency was fixed, and that box no longer boots. It will get to the login screen, but before you have time to actually login, it reboots itself. I tried safe mode as well, and have the same problem. I left the machine alone all last week (I was on vacation), and when I got back, it was still in a reboot loop. I just ran a full chckdsk /p from recovery concole, and that didnt help, and now im repairing the windows installation from the XP w/SP2 cd.

I can't say for certain this was caused by PolyMon, but that was the only change I made before rebooting, so I thought I would see if anyone else ran into this. Since this is not a production box, and is just for testing, I dont care to scrap it and start over, but I would like to know if this was a fluke, or if anyone else knows what I am talking about.
Jul 2, 2007 at 5:06 PM
The PolyMon install installs the Manager app and the Service. However, it does not make any dependency changes in the services (instead, since service could be on a separate box from sql anyway), it will always "start", but instead of running monitors on it's timer loops, will instead try to connect to the specified database until the connection is succesful.

Other than that, the service is installed using the standard MSI installer. PolyMon manager should not make any system setting changes apart from shortcuts in the the program menu.

I would be curious to know what the event logs are showing - hopefully it is not related to PolyMon install.

(The upgrade from RC3 is basically the same as a full install but the database create step is omitted and a T-SQL upgrade script is run against the polymon database).

I have upgraded our own production polymon server (shared with other services, etc) and not seen the same issue arise.

Question: Did you update the SQL database prior to rebooting? (The polymon service now checks to make sure the database version is correct, otherwise it will stop the service start command and issue a stop). Again, I've tested this on my own XP box and have not encountered any problems (but not using SQLEXPRESS, which should theoretically not be any different). I will try again but not updating the database prior to rebooting to make sure I do not reproduce the same problem.
Jul 2, 2007 at 8:02 PM
After chkdsk finished, there seemed to be no change. After the Repair install, it sorta seems okay...I don't think this is related to PolyMon, it is probably some faulty hardware or something. There is nothing in the event viewer after 6/30/07, even though there should be each time I try to start PolyMon...I also noticed that my SQLEXPRESS instance fails to start, and nothing is logged. This machine is all sorts of messed up, but it might not be from PolyMon, I just wanted to know if this happened to anyone else. I did update the database prior to rebooting.
Jul 2, 2007 at 8:17 PM
I just re-read my last post, and I think I left a few details out. After the repair install, I was able to login for a while. It's strange...there's no BSOD, no entry in the Event Viewer, and the CPU, RAM, and HDD all passed diagnostic tests. Anyway, if I turn the machine on and walk away, it stays in an endless reboot loop, about 15 seconds after showing the login screen. If I login right away, it didn't reboot anymore, but I ran Windows Updates, and now I can't login again. When I was able to login, I saw that neither the PolyMon Executive service nor the SQLEXPRESS service were running, and when I tried to start either, I got an error, saying to check the Event Log (where there was nothing new).
Oct 9, 2007 at 2:05 AM
I experienced pretty much the same thing, the only difference was my would reboot about 20 seconds into displaying, "Applying machine settings...". I knew that it had to be PolyMon, because it was the only thing done since the machine since that last successful reboot. After rebooting using the "last known good", I reinstalled PolyMon and tried starting the service the error message is below. It turns out that my problem was not that the database did not start... but that someone removed the drive containing the database software and all from the SANS.

In other words something about how Executive handles the database connection failure before a user is logged on causes the machine to reboot.

Could I also say, it would be nice if the console would tell you the reason it will not start.

Windows Server 2003 SP1 (completely updated)
Newly installed PolyMon 1.0.0.0
Separate SQL Server 2005 (database accidentally deleted)

If you need more let me know. Thanks for this great monitoring tool.

A soon to be former ServersAlive user,

Nathan

*** Error Follows ***
Event Type: Error
Event Source: .NET Runtime 2.0 Error Reporting
Event Category: None
Event ID: 5000
Date: 10/8/2007
Time: 8:05:03 PM
User: N/A
Computer: MONITOR02
Description:
EventType clr20r3, P1 polymonmanager.exe, P2 1.0.0.0, P3 46780ff2, P4 polymonmanager, P5 1.0.0.0, P6 46780ff2, P7 23, P8 c6, P9 system.invalidoperationexception, P10 NIL.

For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.
Data:
0000: 63 00 6c 00 72 00 32 00 c.l.r.2.
0008: 30 00 72 00 33 00 2c 00 0.r.3.,.
0010: 20 00 70 00 6f 00 6c 00 .p.o.l.
0018: 79 00 6d 00 6f 00 6e 00 y.m.o.n.
0020: 6d 00 61 00 6e 00 61 00 m.a.n.a.
0028: 67 00 65 00 72 00 2e 00 g.e.r...
0030: 65 00 78 00 65 00 2c 00 e.x.e.,.
0038: 20 00 31 00 2e 00 30 00 .1...0.
0040: 2e 00 30 00 2e 00 30 00 ..0...0.
0048: 2c 00 20 00 34 00 36 00 ,. .4.6.
0050: 37 00 38 00 30 00 66 00 7.8.0.f.
0058: 66 00 32 00 2c 00 20 00 f.2.,. .
0060: 70 00 6f 00 6c 00 79 00 p.o.l.y.
0068: 6d 00 6f 00 6e 00 6d 00 m.o.n.m.
0070: 61 00 6e 00 61 00 67 00 a.n.a.g.
0078: 65 00 72 00 2c 00 20 00 e.r.,. .
0080: 31 00 2e 00 30 00 2e 00 1...0...
0088: 30 00 2e 00 30 00 2c 00 0...0.,.
0090: 20 00 34 00 36 00 37 00 .4.6.7.
0098: 38 00 30 00 66 00 66 00 8.0.f.f.
00a0: 32 00 2c 00 20 00 32 00 2.,. .2.
00a8: 33 00 2c 00 20 00 63 00 3.,. .c.
00b0: 36 00 2c 00 20 00 73 00 6.,. .s.
00b8: 79 00 73 00 74 00 65 00 y.s.t.e.
00c0: 6d 00 2e 00 69 00 6e 00 m...i.n.
00c8: 76 00 61 00 6c 00 69 00 v.a.l.i.
00d0: 64 00 6f 00 70 00 65 00 d.o.p.e.
00d8: 72 00 61 00 74 00 69 00 r.a.t.i.
00e0: 6f 00 6e 00 65 00 78 00 o.n.e.x.
00e8: 63 00 65 00 70 00 74 00 c.e.p.t.
00f0: 69 00 6f 00 6e 00 20 00 i.o.n. .
00f8: 4e 00 49 00 4c 00 0d 00 N.I.L...
0100: 0a 00 ..

Oct 12, 2007 at 1:31 PM
There's nothing the in the PolyMon Executive windows service that should cause a reboot - specifically, the service starts (even if it canot access the database) and kicks off a timer that either attempts to connect to the specified database or runs monitors if it has sucesfully connected to the database. This also means that the PolyMon Executive service will start "succesfully" even if no succesful database connection can be made. Any errors or failures to connect to the database are written to the service host's event log - did you look at those and see what error messages come up?
I suppose there could be a .NET framework issue that causes a reboot when a SQL connection is attempted in certain circumstances but I wam not aware of such problems.
The PolyMon manager application on the other hand will not start unless it is able to succesfully connect to the underlying sql database. If it fails then it does return whatever error message is returned by the .NET framework and that is probably what you are seeing. Executive errors would not come up in a Window normally and would get logged to the Event Log instead.
Thanks,
Fred.
Oct 12, 2007 at 3:21 PM

fbaptiste wrote:
There's nothing the in the PolyMon Executive windows service that should cause a reboot...


Understood, but there was nothing else that could have triggered the rebooting. When the SQL server was taken offline, the server with PolyMon would not start up cleanly and removing PolyMon (and/or bringing SQL back online) fixed the problem.

And yes there was an entry in the PolyMon event log, just before each reboot there was, "Service starting..."

Jan 14, 2008 at 9:10 PM
This rebooting situation has happened again and so I have created an Issue:
http://www.codeplex.com/polymon/WorkItem/View.aspx?WorkItemId=14732
Apr 28, 2008 at 9:43 AM
I am getting the same issue on two servers. If I unplug the network cables and reboot, then login and plug the cables back in it boots ok, but booting with the cables in just keeps the machines in a loop saying 'Applying Computer Settings'.
May 8, 2008 at 4:36 AM
This is still a total mystery to me. There is nothing in the executive code that is asking for a reboot.
The only thing I can think of is some bug in the .Net framework that must be causing this.
Next release will include some changes to the windows service that will allow it to run even if it runs into problems logging to the Event log. I seem to remember someone saying there was a problem when the Event log was full.
Maybe that will fix the problem?
If problem persists for you after next release I would like to release some patches to disable event logging altogether.
I have no particular reason to think the event logging is the issue other than that everything else the service does should absolutely not be causing a problem - setting up timers, connecting to a database - that's it, and I'm using this in plenty of other projects that I just don't think this could be it. So that leaves Event logs. I'm using the standard .net api's for this but that's the only libraries used in the Executive I have not used elsewhere extensively...