Delivery status updates and transcoding delayed
Incident Report for PRX
Postmortem

Our hosting provider was able to make the upgrades and restart the servers within the first 30 minutes of the delivery window.

We then went over all the applications that run on these servers, and discovered a couple of services that did not start up correctly on their own, and brought them up. These findings will result in tickets to ensure these services are correctly managed.

We also discovered an issue with one server's RAID battery that will also require downtime, we will schedule that and report on it in a new incident.

Posted May 27, 2015 - 13:57 EDT

Resolved
All servers are caught up, and to be on the safe side, all series have been reprocessed to ensure no deliveries were missed (this will result in some retries of previously successful deliveries).
Posted Apr 26, 2015 - 15:13 EDT
Monitoring
A server handling delivery status updates was not working correctly, and delayed notification of completed deliveries. This also affected updates of completed validations and further processing on file uploads. The system is working correctly now, and we are monitoring to make sure it catches up and completes all processing.
Posted Apr 26, 2015 - 13:27 EDT