Unknown Status for Event Log Service

10 replies [Last post]
twalls's picture
twalls
User offline. Last seen 8 weeks 2 days ago. Offline
Opsview Enterprise SubscriberOpsview Sensei - 5th DanOpsview Sensei - 4th DanOpsview Sensei - 3rd DanOpsview Sensei - 2nd DanOpsview Sensei - 1st Dan
Joined: 15 Jun 2010
Posts:
Points: 958

We recently setup our Opsview Enterprise server and are gradually getting the hang of things as we add new hosts to monitor. A problem that we are seeing on most of our Windows servers with event log monitoring enabled is the following:

CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages.

The error we see in the NRPE logs on the local machines is:

error:.\NRPEListener.cpp:302: NRPESocketException: To much data cant create return packet (truncate datat)

I looked around in the Opsview mailing list archives and found mention of a patch that was applied to both the server and client to fix this issue (please see http://lists.opsview.org/lurker/message/20090407.200012.f08885af.en.html). Unfortunately, the link to the patch doesn't appear to work anymore. Is a fix like that still available, or does something need to be changed in our configuration to handle the large event logs?

Thanks in advance for any help anyone can offer with this!

0
Your rating: None
tonvoon's picture
tonvoon
User offline. Last seen 5 days 7 hours ago. Offline
Opsview Sensei - 1st Dan
Joined: 26 May 2010
Posts:
Points: 65

The link is now: http://labs.opsview.com/2008/08/enhancing-nrpe-for-large-output/

However, the client side change is already in Opsview. I'm not sure what the error is from the NRPE end on windows is. The windows client should really truncate before sending back to NRPE - this seems like a bug in the windows client.

While we don't develop in the windows client, we do have a relationship with the developer. With some sponsorship, he might be able to merge our NRPE code enhancement into the windows client for all to use. Would you be interested in sponsoring this?

Your rating: None
twalls's picture
twalls
User offline. Last seen 8 weeks 2 days ago. Offline
Opsview Enterprise SubscriberOpsview Sensei - 5th DanOpsview Sensei - 4th DanOpsview Sensei - 3rd DanOpsview Sensei - 2nd DanOpsview Sensei - 1st Dan
Joined: 15 Jun 2010
Posts:
Points: 958

Thanks for the updated link, Ton. When you say the client side change is already in Opsview, do you mean the Opsview Agent packaged for Windows? If so, then that is what we are using. These are fresh installs of the agent from the Opsview site that were downloaded just a few days ago.

Sorry for being dense, but what exactly does sponsoring a change like that involve? In other words, is there some kind of monetary donation to the open source project, or is it more just like a vote where whatever feature gets requested the most gets higher priority? Seems like from your comment on this blog post, which albeit is rather old, you suggested that the client's author didn't want to implement this change back then, in the interest of keeping the protocol simple. Do you know if that is still the case?

Your rating: None
twalls's picture
twalls
User offline. Last seen 8 weeks 2 days ago. Offline
Opsview Enterprise SubscriberOpsview Sensei - 5th DanOpsview Sensei - 4th DanOpsview Sensei - 3rd DanOpsview Sensei - 2nd DanOpsview Sensei - 1st Dan
Joined: 15 Jun 2010
Posts:
Points: 958

We are now able to consistently reproduce the issue on five test hosts. They have been displaying this error for 5 days, 22 hours, and counting. Other hosts show the error as well, but not all the time. Like I said above, we're still testing rolling out Opsview Enterprise, so we only have about 45 hosts added so far. This problem appears to be isolated to our Windows 2008 R2 64-bit servers. The Opsview Agent that is available for download from the Opsview web site appears to be pretty old when compared to what is available on the NSClient++ web site. (The newest changelog entry for the Opsview Agent is from 2008-09-24 for version 0.3.5. The current version of NSClient++ is verison 0.3.8 which was last updated on 05/27/10.)

I'd like to try and test the NSClient++ agent to see if the problem has been resolved in the newer build. Unfortunately, the NSClient++ agent appears to require some configuration before it will even talk to Opsview, and I'm not sure where to begin. I've tried overwriting the newer .ini file with the original Opsview one. I resume getting responses in Opsview, but I get errors in the log about missing files, so I'm worried something might have broken compatibility with the plug-ins.

Is there a documented process for getting the newer NSClient++ agent configured to work with Opsview Enterprise?

Your rating: None
dferguson's picture
dferguson
User offline. Last seen 22 hours 38 sec ago. Offline
Opsview Sensei - 1st Dan
Joined: 02 Jun 2010
Posts:
Points: 35

We haven't had this error reported before and we haven't been able to reproduce on our systems yet, but we are still trying.

You should be able to install the newer NSClient++ code over the top of the opsview agent code as follows:

  1. stop nsclient++ service
  2. copy NSC.ini file to one side
  3. Extract newer NSClient++ code over the top of the opsview agent directory
  4. Replace NSC.ini file
  5. start nsclient++ service

If you have any problems, please let me know

  Duncs

Your rating: None
dferguson's picture
dferguson
User offline. Last seen 22 hours 38 sec ago. Offline
Opsview Sensei - 1st Dan
Joined: 02 Jun 2010
Posts:
Points: 35

I have raised a ticket about the NSClient++ code upgrade here: 

  https://secure.opsera.com/jira/browse/OPS-1356

  Duncs

Your rating: None
twalls's picture
twalls
User offline. Last seen 8 weeks 2 days ago. Offline
Opsview Enterprise SubscriberOpsview Sensei - 5th DanOpsview Sensei - 4th DanOpsview Sensei - 3rd DanOpsview Sensei - 2nd DanOpsview Sensei - 1st Dan
Joined: 15 Jun 2010
Posts:
Points: 958

Thank you very much, Duncs! I really appreciate you opening the request for the agent and providing interim instructions for getting the agent updated on our machines. I will try them out tonight and let you know how it turns out.

Your rating: None
twalls's picture
twalls
User offline. Last seen 8 weeks 2 days ago. Offline
Opsview Enterprise SubscriberOpsview Sensei - 5th DanOpsview Sensei - 4th DanOpsview Sensei - 3rd DanOpsview Sensei - 2nd DanOpsview Sensei - 1st Dan
Joined: 15 Jun 2010
Posts:
Points: 958

Well, I was able to get the newer agent working on a test host. I was very excited to see the error stop appearing, and so I went to install the upgraded agent on a few of our other development hosts. That stopped when I saw a new error pop up on the first one. Alas, the original issue is back with new wording:

Could not construct return paket in NRPE handler check clientside (nsclient.log) logs...

When I check the opsview-agent.log, I see what looks to be the exact same error as before with the older Opsview agent.

Here's what we were getting in the log before the upgrade:

2010-09-09 18:27:18: error:.\NRPEListener.cpp:302: NRPESocketException: To much data cant create return packet (truncate datat)

Here's everything from after the upgrade (notice the last line with same error, typos and all):

2010-09-09 18:31:32: message:modules\FileLogger\FileLogger.cpp:92: Starting to log for: NSClient++ - 0.3.8.76 2010-05-27
2010-09-09 18:42:18: debug:CACHENSClient++.cpp:551: Attempting to start NSCLient++ - 0.3.8.76 2010-05-27
2010-09-09 18:42:18: message:CACHEmodules\FileLogger\FileLogger.cpp:93: Log path is: C:\Program Files\Opsview Agent\\opsview-agent.log
2010-09-09 18:42:18: error:modules\NRPEListener\NRPEListener.cpp:325: NRPESocketException: To much data cant create return packet (truncate datat)

If it was just one machine, I'd say it was a fluke with the machine's event logs, but that isn't the case. Out of 16 development machines running Windows 2008 R2 64-bit, 5 keep displaying the error, while others will display the error intermittently. As I write this, nine have the error. Not sure what else I can try on my end at this point.

Your rating: None
twalls's picture
twalls
User offline. Last seen 8 weeks 2 days ago. Offline
Opsview Enterprise SubscriberOpsview Sensei - 5th DanOpsview Sensei - 4th DanOpsview Sensei - 3rd DanOpsview Sensei - 2nd DanOpsview Sensei - 1st Dan
Joined: 15 Jun 2010
Posts:
Points: 958

Just wanted to give an update. It looks like the hosts running the newer agent, while they do still display the error, it isn't persistent like before. It seems to come and go now. The problem isn't fixed, but it does seem to be improved by using the newer agent.

Thanks again for looking into this.

Your rating: None
dferguson's picture
dferguson
User offline. Last seen 22 hours 38 sec ago. Offline
Opsview Sensei - 1st Dan
Joined: 02 Jun 2010
Posts:
Points: 35

In looking at another issue I found this bug report which could be relevant:

http://nsclient.org/nscp/ticket/267

Essentially, on the check args remove the word 'descriptions' and it may help with the problem

Also see

http://nsclient.org/nscp/ticket/238

which discusses amending a buffer size limit.

  Duncs

Your rating: None
twalls's picture
twalls
User offline. Last seen 8 weeks 2 days ago. Offline
Opsview Enterprise SubscriberOpsview Sensei - 5th DanOpsview Sensei - 4th DanOpsview Sensei - 3rd DanOpsview Sensei - 2nd DanOpsview Sensei - 1st Dan
Joined: 15 Jun 2010
Posts:
Points: 958

Thanks for the reply, Duncan. I suspect the answer lies in this little jewel from the second link you sent:

Thinking that I finally got the buffer issue resolved and now I was just sending too much data to the NRPE client code, I adjusted down the size I was truncating data to. I adjusted my truncate=1024 value to truncate=990 for all the CheckEventLog commands and all the errors went away and appears to be working fine. I did not try any values higher than that, but you could probably bump it up a little.

Will try it out and let you know how it goes!

Your rating: None