Tuso Service Desk SOP v1.9

SmartCare Service Desk and Incident management

Standard Operating Procedure Version 1.9

9 July 2024

Overview

Zambia’s SmartCare (SC) Health Information System operates in over 1,500 health facilities, situated throughout the nation, representing nearly 100% HIV treatment current coverage of the electronic system. Three SC types are in operation including eLast, eFirst, SmartCare Plus (SC+) and SmartCare Pro. In its current state, over 1,300 SC facilities operate eLast, a retrospective system supported by data entry clerks. By 2018, a prospective, SC electronic medical record system (eFirst) was introduced at the health facility point of care with coverage in approximately 200 high volume sites. By 2020, the SC legacy system was re-engineered into SC+, an electronic medical record package, characterized by enhanced features for improved decision support at the clinical point of care; interoperability of information systems to improve health information exchange; and a connectivity and alternative power package to sustain reliable operations. SC+ replaced eFirst operations, and overtime, it will also replace a proportion of eLast sites. In 2022 a better version of SmartCare was inverted with a centralised Server and this was first deployed in 2023, this system is called SmartCare Pro. Each SC type is guided by specific standard operating procedures to assure optimal daily use for clinical management at the point of care and routine National decision-making.

To assure the SC system is performing as expected, the SmartCare Service Desk was introduced to monitor its operations in real-time. The SmartCare Service Desk is an incident management system used to detect and manage interruptions to operations, that may require a level of response, to assure a return to expected performance. The Service Desk is a single, and reliable, point of contact between the various users of the SC system and service providers. Its primary objective is to facilitate the resolution of end user/requestor incidents as quickly as possible. To assure active management, SC users can report and create incident tickets to notify on interruption to routine operations that may require further investigation and to assure timely return to normal operations.

The Service Desk’s effectiveness is in providing rapid responses to reported incidents by SC users. It is not designed to be an avenue for detailed SC training, whereas responses to such queries are redirected to the relevant Implementing Partner, Provincial Health Office, or IHM Capacity Building and Adoption team for follow-up. Reported incidents that cannot be immediately resolved by the Service Desk team, are escalated to personnel experienced in the specific applications. The Service Desk provides SC support in facilities and will assist in troubleshooting system and connectivity incidents.

Any downtime to the SC system, system technical difficulties, or needed enhancements that cannot be solved at the local level may represent a SC security risk, patient safety risk, or both. Incidents should be immediately reported to the Service Desk By the person who discovers the incident.

Objectives Of the Service Desk

The following are the objectives of the Service Desk system:

Facilitate the efficient resolution of end user/requestor problems related to the SC system

Maintain a historical log of IT support provided to end user/requestors of the system

Act as a record of software and hardware incidents encountered by systems users

Maintain a record of equipment needs

Ensure incidents do not fall through the cracks as would be the case if reported in non-standard, ad hoc ways, or outside the guidance of this document

Maintain a record of areas with training gaps to improve SC operations

Generate statistics intended for active systems monitoring and reporting to stakeholders

Hours Of Operation

The Service Desk hours of operation are 8:00 AM to 5:00 PM CAT, Monday to Friday, excluding all Zambia statutory holidays.

Service Desk Service Operation

The Service Desk will endeavour to do the following to maintain a high level of quality customer service:

Seek end user feedback and act on the results.

Fulfill this service level agreement (SLA) with the MOH and with a minimum of 96% incident management success rate.

Decrease customer downtime and incidents

Apply current industry best practices

The Service Desk will achieve the above using the following management protocol:

Service Desk staff receive communication from end users/requestors through calls, emails, and WhatsApp, and log these into the Service Desk system

Trained Service Desk users, such as Implementing Partners or Ministry of Health staff, log into the system and submit issues

Service Desk staff allocate priority status to all reported incidents and respond based on assigned priority status.

An incident is assigned a priority code as described in Table 1, below

Service Desk staff solve the problem or escalate the incident to the relevant IP, Ministry of Health Office, or IHM personnel

Service Desk staff solve the problem, documents the solution, notify the end user/requestor, and support the end user/requestor to implement the solution for the facility

The end user/requestor closes out the incident , to complete the incident management cycle

Benefits Of A Service Desk

The following benefits are to be derived because of implementing the current Service Desk system:

Efficient and tracked incident management

Higher accountability from Implementing Partners, supporting staff, and users

Aids the Service Desk unit, and supporting staff, to provide guidance to end users on possible solutions to ensure continued service provision

Ensures that incidents are escalated appropriately to the next level of service provision if the incident cannot be resolved at a lower level or the service level agreed time lapses

Follow up to ensure the incident or request has been resolved and the ticket is closed

Enables Service Desk with a way to provide updates to users on status of logged incidents

Assists in determining if reported SC incidents are systematic, or non-systematic, for the routine assessment of overall systems performance

When To Contact The Service Desk Or Submit An Incident Ticket

As a rule, if the functionality of the SC system is not providing the results expected, then contact the Service Desk.

Contact the Service Desk if:

SC is down and you cannot solve the problem easily

Your facility champion cannot solve the SC problem, you have investigated as much as possible, but have not found the solution.

You identified a SC problem, while you can work around it, it creates difficulties in using the system

If you have any training requests, or have identified any training gaps, in using the SC system

For any information regarding SC to improve local use

There is a problem with using SC and you are not sure what it is

You found a non-critical incident or problem, which does not affect system use, but should be fixed

You have a suggestion for including new functionality in the system in line with National HIV care and reporting guidelines

Your facility is participating in testing a new SC patch, upgrade, or feature and there are systems errors

You have queries about the latest version of SC you should be operating, generation and submission of TDBs

Roles And Responsibilities For Incident Management

Table 1 below articulates the different roles within incident management and the responsibilities associated with each role.

Table 1: Roles and Responsibilities

Role	Responsibilities	Who is accountable
End User/ Requester	Contact the service desk to raise a new incident request. Log in incidents through the Tuso Self-help portal Follow up on an existing request. Clearly communicate all the required information to technicians or Service Desk team. Acknowledge the restoration of service and completion of the ticket. Respond to follow-up surveys after ticket resolution completing the feedback loop. Close out the incident	Healthcare provider any other system user
Service Desk Officer	Log all incoming incident requests with appropriate parameters like category, urgency, and priority. Assign tickets to technicians. Analyze and resolve an incident to restore service. Escalate unresolved incidents to the relevant technicians. Gather all required information from the requesters and send them regular updates on the status of their request. Act as a point of contact for requesters, and, if needed, coordinate between the Service Desk and Requesters. Verify the resolution with the end user and collect feedback.	Service Desk Officer
Service Desk Manager	Take accountability for the overall process of issue and incident management. Define key performance indicators (KPIs) and align them with critical success factors (CSFs). Review KPIs and ensure that they meet business goals and CSFs. Design, document, review, and improve processes. Establish continuous service improvement (CSI) wherein the procedures, policies, roles, technology, and other aspects of the incident management process are reviewed and improved upon. Stay informed about industry best practices and incorporate them into the incident management process.	IT Services Manager
Training Coordinator	Responsible for training needs, knowledge gap and competency improvement among users for LSC, SC + and SC Pro Train technical users, super users/champions, and user users/ frontline healthcare providers Perform training need assessments Conduct training mop ups Conduct provider shadowing and follow up competency assessments	Capacity Building and Adoption Manager
Hardware IT Technician	Provide technical support for computer hardware Troubleshoot and resolve hardware incidences	IT Officers
Software Engineer	Design and develop software application. Resolve software related incidences. Enhancing applications by identifying opportunities for improvement, making recommendations, and designing and implementing systems Maintaining and improving existing codebases and peer review code changes Liaising with colleagues to implement technical designs Investigating and using new technologies where relevant Providing written knowledge transfer material	Software Engineer
Senior Software Engineer	Lead and manage SC software development projects Research and provide necessary technical guidance Provide LOE for software application development Responsible for accounting on software application incidences and resolution of the same	Principal Software Engineer
Database Administrator	Troubleshoot SC plus and LSC database related issues and incidents. Ensure that the back end is operational and highly available, and troubleshoot any database problems, including breakage and corruption of base tables and records. Proactively monitor database systems to ensure minimum downtime, provide trend analysis and reporting, ensure database integrity, database backups procedures, create and maintain process and procedure documentations	Database Administrator
System Administrator	Troubleshoot and resolve issues and incidences related to system downtime. Actively resolve problems and issues with server systems to limit work disruptions at facility level Responsible for the maintenance, configuration, and reliable operation of computer systems and servers for SC system. Participate in research and development to continuously improve and keep up with the IT business needs of system operation	Infratel Sysadmin
Network Administrator	Responsible for maintaining computer networks and solving any incident and problems that may occur with them. Provide network administration and support Monitor computer networks and systems to identify how performance can be improved	IT Officers
Cyber Security	Responsible for in-building security during the development stages of software systems, networks, and data centers. Looking for vulnerabilities and risks in hardware and software and when vulnerabilities and breaches are found, closing them off.	Cybersecurity Specialist

Blanket Service Level Agreement

Based on the specific SLA with the MoH, Table 2 presents an incident priority matrix which defines priority of incidents based on their impact on SmartCare system and disruptions of healthcare services. Table 3 outlines the definition of levels of incidents, their prioritization, response times, and feedback protocols. The Service Desk will strive to consistently meet these timelines when logging all incidents and providing feedback.

Incident Prioritization Matrix

Table 2 below illustrates the different definitions of prioritization of the incidents as they are received.

Table 2: Incident priority matrix

Impact	System wide and all live facilities	Selected health facilities affected	Single facility disrupted	Department within a facility	Individual user/HCP
Clinical services disrupted	Critical	Critical	High	High	Moderate
Degraded services	High	High	Moderate	Moderate	Normal
Clinical work not affected	Moderate	Moderate	Normal	Normal	Normal

Severity /Priority Level

Table 3: Service Desk Priority Status and Feedback Protocol

Priority level	Priority	Definition	Response Times	Resolution Time	Feedback Protocol
P1**	Critical	System down System/EHR, server or LAN not usable Production stopped No work-around at local level	15 minutes	4 hours	First update to end user/requestor, within 1 hours, whenever possible Further updates at 1-hour intervals Escalate to IT Officer and Training Officer at end of the day, if the problem is not resolved
P2*	High	Critical Some service areas not working Production is affected Work-around at the local level available	Within 4 hours	8 hours	First update to end user/requestor within 2 hours Further updates at 2-hour intervals until incident or issue has been resolved.
P 3	Moderate	Impaired system like poor response times but not causing work stoppage	Within 8 hours	7 days	First update to end user/requestor within 4 hours Further updates at 8 hours interval/on daily basis
P 4	Normal	Identified gaps in training at facility level Request for information or help on how functions work	The request is forwarded to the relevant IP/Provincial Office/IHM within 1 business day	14 days	First update to end user/requestor r within 3 business days Further updates as required. First update to end user/requestor within 1 business day on required information. If information is not readily available, further updates to be provided as required

System Availability Calculation Matrix

Table 4 below depicts how the metrics and standards for system availability are calculated.

Table 4: System Availability Calculation Matrix

		Service Level Target
		Downtime per Week in minutes
Hours per Day	Days per Week	95%		98%		99%		99.99%		99.999%
8	5	120		48		24		14		1.44
12	5	180		72		36		22		2.16
18	5	270		108		54		32		2.24
24	5	360		144		72		43		4.32
24	6	432		173		86		52		5.184
24	7	504	202		101		60		6.048

Source: ITIL Release Management: A hands-on Guide, 2010 1

Reporting Incidents Through The Service Desk

There are several ways to report incidents through the Service Desk. These have been put in place to make the process as simple, and as seamless, as possible.

Tuso Self Service Portal

The Tuso Self Service Portal lets users submit incidents, request services, view announcements, chat with support staff, consult its Knowledge Base for self-help, reset domain passwords or unlock accounts, and more. The Portal provides end users with 24/7 self-service and self-help capabilities accessible from computers and mobile devices.

The Service Desk portal can be accessed via the web: https://tuso.ihmafrica.com

To access the portal credentials will have to be created for the end user/requestor by the Service Desk unit. Orientation will be provided by IHM to IPs on use of the portal. Additional requests for orientation can be made via the Service Desk team

Phone Call

Service Desk incidents can be logged in between 8:00 AM and 5:00 PM using the following numbers:

Toll Free

8080

Chargeable numbers

+260979655211

+260762436771

To log in an incident via WhatsApp use the following numbers:

+260979655211

+260762436771

All WhatsApp messages must include name of end user/requestor, email address of end user/requestor, province, district, and name of facility.

Service Desk incidents can be logged in by emailing: support@ihmafrica.org

Incidents will automatically be logged, and the end user/requestor will receive an automated response with an assigned ticket number.

Information To Have Ready When Contacting The Service Desk

For a productive and hassle-free experience with the Service Desk, end users/requestor must have the following information at hand when reporting an incident. This information helps to ensure the incident is logged properly allowing for efficient resolution and feedback.

Name of person logging in incident/request.

Province, District, facility name, facility HMIS code that incident/request is being reported for

Specify if facility is a parent site or hub site.

Version of SC being used and if reporting for eFirst, eLast, SC+ or SmartCare Pro operations.

Date of submission of last TDB for eFirst and eLast sites

Detailed description of the incident/request (Example: “I am unable to run MER report” or “A end user/requestor record corrupted when I was trying to save a clinical interaction”)

If the incident causes any error messages to appear, end user/requestor must provide the exact text that is displayed and the module or service that was in user when the error occurred.

How often the problem occurs any pattern has been noted that leads up to the problem occurring.

Include any supporting information such as screen shots, reports, NUPIN of affected end user/requestor /s.

Provide organization’s name and your contact information – phone and email – to enable the service desk team to communicate/provide feeback .

Feedback And Follow-Up In Service Desk

Aside from the regular feedback protocol, the Service Desk team will endeavor to provide stakeholders with weekly pending status updates on reported incidents using the following channels:

Phone

The SC Self-Service portal

The Service Desk will also provide weekly emails to partners as routine status update and for outstanding tickets pending Implementing Partner intervention. All reported incidents will be assigned a ticket number. In order to make a follow-up, users must reference the ticket number allocated at the time the incident was reported upon initial intake to the Service Desk system.

Procedural Steps For Incident Management

Step 01: Incident Logging

Objective: This procedure describes the set of operations required to log an incident

Responsible: Service Desk staff and SC Pro system end-user or requestor

Log incident or request through Tuso self-service portal immediately using https://tuso.ihmafrica.com

Where self-service portal is not available, log incident or request through phone call by calling the Help-Desk toll free number 8080:

If the toll-free number is not going through, log incident or request through SMS or via WhatsApp live chat.

Step 02: Ticket Creation

Objective: This procedure describes the set of operations required to create a ticket in Tuso Self-Service Portal

Responsible: SC Pro system end-user or requestor

Log incident or request through Tuso self-service portal immediately

Where self-service portal is not available, the SmartCare end user or requester should send an email to support@ihmafrica.org (All emails must include name of end user/requestor, province, district, and name of facility)

If there are any limitations preventing the SmartCare end user or requester from sending an email, they can/should log incident or request through phone call by calling the help-desk toll free number:

If the toll-free number is not going through, log incident or request through SMS or via WhatsApp live chat.

Step 03: Incident Categorization

Responsible: SmartCare Pro end-user or requestor

Tickets are categorized using three levels: 1st Level category, 2nd level category, and 3rd level category.

During ticket creation, a user will select the appropriate category based on the type of incident, such as Software, Hardware, Network, Supplies, and Human Interference

Aside from Tier categorization, during ticket creation a user will also select category based on type of incident, i.e., Software, Hardware, Network, Supplies and Human Interference.

Step 04: Incident Prioritization

Responsible: SmartCare Pro end-user or requestor

During Ticket Creation, the end user/requestor picks the Priority level according to the impact of incident at hand.

The list of priorities in Tuso are.

High Priority

Normal Priority,

Medium Priority,

Low Priority

Not Prioritised.

Refer to Table 2 for guidance and information on how each priority is managed.

Step 05: Incident Assignment

Responsible: The Service Desk team

Incidents are assigned to:

IT Officers- Issues that may need the Hub input to resolve such as database unavailable are assigned to them.

Software Developers – All software incidents that require development relating to SmartCare and feature requests.

Capacity Building and Adoption – Incidents that require SmartCare understanding through training are assigned to training for action.

Implementing Partners – These are primarily issues that require equipment replacements.

Incidents are assigned after the Service Desk team has analysed the nature of incident.

Step 06: Task Creation And Management

Responsible: Support Provider assigned an incident

The provider will manage tasks and provide feedback within the System.

Refer to Table 2 for further guidance and information.

Step 07: SLA Management And Escalation

Responsible: The Service Desk team

Monitoring: Regular tracking and monitoring of SLAs to ensure compliance.

Manual Escalation: Support staff can manually escalate tickets if they require urgent attention of higher-level intervention.

Communication: Clear communication channels and protocols to ensure all parties involved are aware of the escalation and their roles in resolving the issue.

Resolution Tracking: Continuous follow-up on escalated tickets to ensure they are resolved within the stipulated timeframes. Step 08: Incident Resolution

Step 08: Incident Resolution

Responsible: Support Provider assigned an incident

The Service Desk team makes follow up with end user/requestor to get full details of incidents.

The team then communicates with the Support providers for solutions and feedback.

The Solution is then shared with the respective end user /requestor (s).

An incident will be resolved by support providers as per the escalation procedure.

Refer to Table 1 for further guidance and information.

Step 09: Incident Review And Corrective Measure

Responsible: Service Desk Team and Support Provider assigned an incident

The Service Desk team reviews the incidents and makes follow up with the support providers.

Weekly meetings are held with the developers Share the provided resolutions with the affected facilities for action on the incidents.

Resolutions shared by the Service Desk team and feedback is received from facilities indicating the output of the resolution.

Step 10: Incident Closure

Responsible: SmartCare Pro end-user or requestor

Once a resolution is provided, the user should close the incident.

The Service Desk team should get in touch with the end user/requestor to confirm that the incident is no longer occurring and that the fix provided is working

The user should change the ticket status from old status (i.e., open or resolved and open) to close.

The user clicks Save to update the incident.

Incident Management Metrics And KPIs

Table 5 below shows the different metrics used in ensuring an efficient and effective incident management system

Table 5: Metrics and KPIs for Incident Management

No.	KPI	Formula	Target
1	SC Pro system availability	SC Pro system availability reporting will include both scheduled and unscheduled outages or downtimes incidents [Availability of system in percentage for the period of interest (per week)]. SC application defined as a critical application.	≥ 99.99%
2	Incident resolution rate (%)	Proportion of incidents that have been resolved among all incidents logged in the system IRR=[resolved incidents/Total logged incidents]	≥95%
3	Average resolution time or Average turn-around time (TAT)	The average time taken to resolve an incident [Sum of the duration of all incidents resolved during the month] /[Total number of incidents resolved during the same month]	C ≤ 4 hrs. H≤ 8 hrs. M≤ 7 days L ≤ 14 days
4	SLA compliance rate	The percentage of incidents resolved within an SLA. [Number of incidents with met SLAs resolved during the month] /[Total number of incidents resolved during the same month]	C ≥ 99.99% H≥ 95% M≥ 90% L ≥ 80%
5	First call resolution rate (%)	Percentage of incidents resolved in the first call	≥95%
6	First Level resolution Rate (%)	[Number of incidents resolved at first level in the week] /[Total number of incidents resolved during the same week]	≥95%
7	Reopen rates	[Number of incidents resolved during the month that were reopened] /[Total number of incidents resolved during the same month] Reopened is when the Reopen Count field is greater than 0 for the incident.	C ≤ 1% H≤ 5% M≤ 10% L ≤ 20%
8	Incident backlog	The number of incidents that are pending in the queue without a resolution.	There are always less than <maximum> unresolved problems.
9	Percentage of major incidents	The number of major incidents compared to the total number of incidents.	≤ 10%
10	Mean time to acknowledge (MTTA)	The average amount of time between a system alert and a team member acknowledging the issue. For example: If you had 10 incidents and there was a total of 40 minutes of time between alert and acknowledgement for all 10, you divide 40 by 10 and come up with an average of four minutes. MTTA=40 minutes/10 incidents=4 minutes MTTA is useful in tracking responsiveness. Is your team suffering from alert fatigue and taking too long to respond? This metric will help you flag the issue.	C ≤ 15 minutes H≤ 30 minutes M≤ 1 hr. L ≤ 1 hr.
11	Mean time to resolution (MTTR)	What it means: The average amount of time it takes to respond to or resolve an incident. What it can show: MTTR can show how quickly your team is able to respond to or resolve issues as they arise. Calculation of MTTR: Mean time to recovery is calculated by adding up all the downtime in a specific period and dividing it by the number of incidents. So, let’s say our systems were down for 30 minutes in two separate incidents in a 14-hour period. 30 divided by two is 15, so our MTTR is 15 minutes. MTTR= [Total unplanned repair or resolution time/Total number of incidents] MTTR=30minutes/ 2 incidents=15 minutes	C ≤ 4 hrs. H≤ 8 hrs. M≤ 7 days L ≤ 14 days
12	Mean time between failure (MTBF)	The average time between failures. It is calculated by dividing the total uptime by the total number of failures. MTBF=Total uptime/ # of incidents Where: Total uptime=(24 hrs- 2 hrs. of incident/downtime)=22hrs. # of incidents that happened within 24 hrs say are 2 Therefore MTBF= ((24-2)/2)=11	Reduce the time between failures
13	Mean time to detect (MTTD) incidents	The average time taken to detect incidents or anomalies.	C ≤ 15 minutes H≤ 30 minutes M≤ 1 hr. L ≤ 1 hr.
14	End user satisfaction on incident resolution	% of completed scores on problem/incident resolution satisfaction survey have a rating of satisfied or very satisfied.	≥95%

After Action Review/Post-mortem/Post-Incident Reviews

After Action Review (AAR) or Post-Incident Reviews is a critical incidents review process that will be conducted to pin-point weaknesses in the people responsible for developing designing, developing, deploying, implementing, and maintaining SC Pro system with other interfaces. These platforms will also facilitate identification of weaknesses in processes and tooling applied for operations of the system with an aim of continuous improvement of the IMS and optimizing the performance of SC Pro system and all other interoperable sub-systems.

“Don’t let a good crisis go to waste! Learn from it to be better next time. It’s all about getting better -not finding blame. Establishing a positive, blameless culture of post-incident evaluation is based on an honest and in-depth evaluation of the incident response. It signals to the organization that technology failure is the perfect opportunity to learn about your operating environment and make improvements to minimize future IT downtime”. 2

Any organisation that values IT should aim to continuously reduce Mean Time to Resolution/Repair (MTTR). 3 As SCHISS project we propose an After-Action Review (AAR) that should be sustainable in the normative programmatic environment for MoH.

Justification For AAR

The AAR platform provides an opportunity for understanding on what went wrong and provide an opportunity for lessons learnt. Lessons learned from the incidents may vary as the incidents themselves and may include among other things.

Certain incidents may help the project team and MoH see a blind spot in the system architecture and or/ service delivery.

Perhaps some mistakes were made in detecting the problem and thus somebody learned a cool new trick with the monitoring or alerting tools.

Perhaps a junior member of the Incident Response Team (IRT) was covering the shift for a senior.

person, handled a tough event, and gained valuable experience and confidence as an Incident Manager.

Perhaps a person or team chronically misses established incident response.

service level agreements (SLAs), thereby slowing the MTTA.

AAR Process

The following are processes that should be followed in conducting an AAR or post-mortem.

Gather Relevant Information

Gathering of all incidents’ records for incidents under review in the period of interest (weekly review). The required data will include the following.

Incidents logs

Communication recordings

Scribe information from the incidents

Incidents’ timelines in hours, minutes, and seconds (all events by timestamps)

Roster of accountable incident responders

Discussion of possible incidents resolutions

Develop An Incident Timeline

Capturing of timelines for all events that took place during incident response is also critical. Capturing timestamps, a summary of key events, and the discussions that take place to support the decisions made during an incident provides valuable insight in to how the people responded to the incident. This process also provides critical data to serves as needed input for improve how the incident responders perform during an incident.

Ensure Relevant Participation

Ensure relevant participation in the AAR and encourage active participation and discussion among the participants.

AAR/RCA Generic 5 Questions

The AAR will be combined with RCA. To start these processes the AAR team will ask the following 5 questions.

Table 6: Questions for AAR

No.	Main questions	*Probe questions*
	A description of the problem (symptoms)	What happened?
	A brief description of the cause of the incident	What caused/contributed to the problem? This is somewhat subjective and may be quite complex based on your environment. The intent is to capture what caused a change from uptime to downtime.
	Who responded to solve it, and what are the time stamps for their dispatch and arrival on the incident? Were the initial responders the right ones for the incident or was it necessary to escalate to more or different SMEs?	Were the right people assembled in the right spot to make the right decisions at the right time?
	What solution was implemented?	Did the incident responders choose the right solution?
	What was the MTTA for the right team of responders and what was the MTTR?	How long did it take to assemble and solve the problem?

Structured AAR Sheet

In addition to the aforementioned questions, the project team may use the following standardize template for carrying incident AAR or post-mortem.

IMS AAR Sheet
Goal: To improve incident response, determine what broke or went wrong and how people responded to the thing that broke, and determine what steps need to be taken to prevent a similar situation from happening again.
List IRT	Speciality of IRT

Incident commander (IC) identified and announced:
IC transferred/changed and list the reason for transfer or change:
Coded responses: N=Not completed, Y=completed, P=partially or completed later
#	Review questions	Weight				Rating
		N		Y	P
	Was sizing up of incident/problem completed, accurate and well-articulated?	0		10	5
	Were appropriate SME’s requested? Is SME response time acceptable? If not, please list reasons why?	0		10	5
	Was the SEV level for incident identified and announced?	0		5	1.5
	Does the IC (Business Analyst) control the flow of the discussion and drive the incident towards resolution in an effective and timely manner? (Validate this against targeted resolution time as per the SLA)	0		35	17.5
	Did the IC adhere to acceptable span of control numbers? (If exceeded, was this acceptable?) Did the IC control the extra numbers effectively?	0		5	1.5
	Did the IC establish effective communications?	0		10	5
	Is there an incident timeline, and estimated time to resolution?	0		5	1.5
	Were briefings, notifications and postings made at the appropriate time (i.e., every 1 hours )?	0		10	5
	Did the IC develop a backup plan and/or consider second/ third tier alternatives?	0		5	1.5
	Did the IC discuss notifying DR at 30 minutes? Did they make appropriate notifications at the 1-hour mark for potential DR?	0		5	1.5
	Total		100%

Deduce Relevant Recommendations For Improvement

The main purpose for the AAR is to pin-point weakness, learn from the process and identify relevant solutions or continuous improvements to the people part of the response as well as the technology problems encountered. Schnepp, Vidal & Hawley (1017) suggest that when it comes to recommendation and improvement of nontechnical of the IMS and the system being serviced, the acronym TALENT (Training, Accountability, Leadership, Empowerment, and notification) should be applied as a framework. Based on the relevant recommendations, data informed actions or activities should be operationalized into work schedules for the responsible individuals.

Table 7: AAR Recommendation and accountability matrix

Action	Timeline	Responsible	Assigned Depart-	Notes

Definition Of Terms/Glossary For Incident Management

Table 8 below is a glossary of key terms contained in this document

Table 8: Glossary of Key Terms

No.	Term	Definition
	Incident	An unplanned interruption to an IT service or reduction in the quality of an IT service. Failure of a configuration item, even if it has not yet affected a service, is also an incident (e.g., failure of one disk from a mirror set).
	Incident identification	The process of discovering an incident
	Incident logging	Creating and maintaining a record of an incident in the form of a ticket
	Incident categorization	Recording an incident with due diligence so that it’s placed under the appropriate category.
	Incident closure	Closing an open incident ticket once the incident has been resolved
	Incident escalation rules	A set of rules defining the hierarchy for escalating incidents, including triggers that lead to escalations. Triggers are usually based on incident severity and resolution time
	Incident management	Managing the life cycle of all incidents to restore normal service operation as quickly as possible and minimize business impact
	Incident management report	A series of reports produced by the incident manager for various target groups (e.g., teams responsible for IT management, service level management, other service management processes, or incident management itself).
	Incident manager	The person responsible for the effective implementation of the incident management process and carrying out reporting. Also represents the first stage of escalation if an incident is not able to be resolved within the agreed service level.
	Incident model	Contains the predefined steps that should be taken to deal with a particular type of incident.
	Incident monitoring	Tracking the processing status of outstanding incidents so that counter measures may be introduced as soon as possible if service levels are likely to be breached.
	Incident prioritization	Assigning priorities to incidents and defining what constitutes a major Incident
	Incident record	A collection of data with all details of an incident, documenting the history of the incident from registration to closure.
	Incident report	A report that includes information about incidents, how they were handled, and other data that can help measure the performance of the incident management process.
	Incident resolution	The workaround or correction that fixes the incident and restores service to its best quality.
	Incident status	How far along an incident is in the incident management process? Common statuses include: New: An incident that has been logged but not yet worked on. Assigned: An incident that has been received in the Tuso Service Desk and assigned to a technician. In progress: An incident that has been assigned to a technician and is in the process of receiving a resolution. On hold or pending: An incident that has been temporarily put on hold. Resolved: An incident that was worked on by a technician and has received a resolution. Closed: An incident that was closed once the resolution was acknowledged by the end user.

Knowledge Base
Library

Tuso Service Desk SOP v1.9

SmartCare Service Desk and Incident management

Standard Operating Procedure Version 1.9

Overview

Objectives Of the Service Desk

ADDRESS

Join The Community