[clug] Nagios escalations - help needed

Collins, Steve Steve.Collins at industry.gov.au
Mon Sep 6 02:12:45 GMT 2004


I have the following need:

1. Monitor several services 24x7 on a group of hosts
2. During workhours, notify a limited group of users every 30 or 60 minutes (dependent on the service) by email if any of the services aren't working, and subsequent host notifications (which works just fine).
3. 24x7, escalate service notifications so that the following occur:
	* on notification 2 and subsequent, SMS our oncall person every 6 hours
	* on notification 2 ONLY, email our main client and our service desk

I get service and host notifications for what I need by email just fine, but the escalations don't seem to work.  I had a server die on the weekend and go no SMS.  Below are the (I believe) appropriate bits of my config files.  I'd like to get the bits for webdev and webdev-oncall working.  After that, I should be able to add in things like management, etc.

I'd greatly appreciate any advice on what and where I've gone wrong (which is surely the case).  I know that notifications are very config dependent, and it's just the figuring out of where I've cruelled the config so that they are mucked up that I need help with.  What I'd initially like to do is set it all up with a really close set of notification periods so I can test it all, and then push the periods back to reality.

I'm using Nagios 1.2 and the latest NagMIN for config editing.

Host.cfg (typical entry)
~~~~~~~~~~~~~~~~~~~~~~~~
define host {
    use    generic-host
    host_name    DONKEY
    alias    DONKEY Server
    address    172.16.2.30
    parents    MacquarieRack
    check_command    check-host-alive
    max_check_attempts    3
    notification_interval    60
    notification_period    24x7
    notification_options    d,u,r
}

HostGroup.cfg
~~~~~~~~~~~~~
define hostgroup {
    hostgroup_name    Macquarie_internet_zone
    alias    MCT Internet Zone
    contact_groups    online,webdev
    members    DONKEY (plus several others)
}

Contact.cfg
~~~~~~~~~~~
define contact {
    use    generic-contact
    contact_name    scollins
    alias    Stephen Collins
    email    steve.collins at industry.gov.au
    service_notification_period    24x7
    host_notification_period    24x7
    service_notification_options    w,u,c,r
    host_notification_options    d,u,r
    service_notification_commands    notify-by-email
    host_notification_commands    host-notify-by-email
}
define contact {
    use    generic-contact
    contact_name    servicedesk
    alias    Service Desk
    email    servicedesk at industry.gov.au
    service_notification_period    24x7
    host_notification_period    24x7
    service_notification_options    w,u,c,r
    host_notification_options    d,u,r
    service_notification_commands    notify-by-email
    host_notification_commands    host-notify-by-email
}
define contact {
    use    generic-contact
    contact_name    webdev-oncall
    alias    Web Development Team Oncall Member
    pager    0421054024 at streetdata.com.au
    service_notification_period    nonworkhours
    host_notification_period    nonworkhours
    service_notification_options    w,u,c,r
    host_notification_options    d,u,r
    service_notification_commands    notify-by-epager
    host_notification_commands    host-notify-by-epager
}

Contactgroup.cfg
~~~~~~~~~~~~~~~~
define contactgroup {
    contactgroup_name    webdev
    alias    Web Development Team Staff
    members    brobinson,imacintosh,mwalsh,rbuerckner,scollins,sjanssens
}
define contactgroup {
    contactgroup_name    webdevoncall
    alias    Web Development Team Staff - Oncall
    members    imacintosh-sms,webdev-oncall
}

Service.cfg
~~~~~~~~~~~
define service {
    use    NM-HTTP
    hostgroup_name    Macquarie_internet_zone
    service_description    Check HTTP [MCT]
    contact_groups    webdev
    check_period    24x7
    notification_interval    30
    notification_options    w,u,c,r
    notification_period    24x7
    check_command    check_http_mct
    max_check_attempts    3
    normal_check_interval    5
    retry_check_interval    1
}

ServiceEscalation.cfg
~~~~~~~~~~~~~~~~~~~~~

define serviceescalation {
    hostgroup_name    Macquarie_internet_zone
    service_description    Check HTTP [MCT]
    first_notification    2
    last_notification    0
    notification_interval    360
    contact_groups    webdevoncall
}
define serviceescalation {
    hostgroup_name    Macquarie_internet_zone
    service_description    Check HTTP [MCT]
    first_notification    2
    last_notification    2
    notification_interval    60
    contact_groups    servicedesk,webpub
}

Thanks!

Steve
--
Stephen Collins
Web Development Section
eBusiness Division
__________________________________________________
Department of Industry, Tourism and Resources 
Level 12, 20 Allara Street, Canberra City ACT 2600
GPO Box 9839, Canberra ACT 2601

E steve.collins at industry.gov.au
P +61 2 62137193
C +61 410 680722
F +61 2 62136227


**********************************************************************
The information contained in this e-mail, and any attachments to it, is
intended for the use of the addressee and is confidential. If you are not the intended recipient you must not use, disclose, read, forward, copy or retain any of the information. If you have received this e-mail in
error, please delete it and notify the sender by return e-mail or telephone.
The Commonwealth does not warrant that any attachments are free from viruses or any other defects. You assume all liability for any loss, damage or other consequences which may arise from opening or using the attachments.
***********************************************************************************



More information about the linux mailing list