SLAs for Disaster Recover- How much do we really understand?

What is the difference between RTO (Recovery Time Objective) and RPO (Recovery Point Objective).

More often than not, these terms are either used interchangeably or are confused resulting in faulty designs. Here is layperson’s attempt to explain the same.

Recovery Point Objective (RPO) is the ‘Maximum amount of acceptable data loss’ in case of an outage/disaster or event. i.e. if your defined RPO was 30 minutes and the disaster happened at 12 Noon. The understanding that when the systems/services are restored (not necessarily in 30 mints), they should have the data till 11:30 AM of the same day. RPO is measured as a unit of data loss in time (30 minutes of data loss, 2 hours of data loss etc)

Recovery Time Objective (RTO) on the other hand is the ‘Maximum amount of acceptable service outage’ in case of an outage/disaster or event. i.e. If your defined RTO is 4 hours and the disaster happened and 12 Noon. The services should be restored latest by 4:00 PM on the same day. RTO on the other hand is measured as a unit of service outage in time (2 hours of outage, 4 hours of outage etc)

Here is a quick case study to ponder upon

For a given application the RPO is 30 minutes and RTO is 5 Hours. On a given day there was an incident leading to a service outage that started at12 Noon. All services were restored to as is state at 11:30 AM of the same day at 6:00 PM on the same day.

Was the SLA breached, if yes for which parameter?

8.5 Years of experience in Infrastructure Management, Cloud Evangelist with experience is solution design, Pre-Sales, Cloud Adaptation, Cost Optimization