Unplanned Production Outage - 12-SEP-2018

Summary

On August 30, 2018, the SIS data center reported two bad disks. These two disks were scheduled for replacement on September 06, 2018. After the RAID rebuilds, performance improvement was not effective and became more apparent after several failures spanning several days. Today, Wednesday, September 12, 2018 at approximately 4:34 P.M. CST, the Education Partners Technology Management team was alerted to a critical server and network issue.     


The Technology Management team has extensively investigated and found that today's unplanned outage and last weeks' attempt to replace the two disks are in fact related. Our TM team is currently attempting to repair the issue at hand along with engineers at the SIS data center. Updates will be provided as soon as they become available. We apologize for the inconvenience that this may have caused.


Education Partners - Product Management Team


================
*** UPDATE 5:08 AM CST / 15-SEP-2018 ***
================


The report server is fully configured for all SIS users. All databases used for TopView reports are up and running, along with any automated jobs for scheduled reports. We have tested and run a few Topview Reports across all customer environments and all appears to be working as expected. We have also successfully run a few Quick Query reports that are utilizing custom field entities from the report server and all appears to be working normally. All subscriptions for scheduled reports are working as well. The Dashboards that appear on the Home screen are also now functioning and should be visible upon logging in. Please be advised, there may be a possibility of something that may not be working. If so, please report any issue that is found with a support ticket and detail to us what you found that may not be working as expected. 


================
*** UPDATE 12:55 PM CST / 14-SEP-2018 ***
================


The Data Center has finished recovering the reporting cluster and have restored the databases (including user data) from backups on the Avamar array. This is work that was started early this morning. The Data Center team has tested fail-over functionality and all seems to be working properly. The Technology Management & Engineering team will now begin take over and work towards bringing the reporting databases online. Updates will be provided as they become available.


================
*** UPDATE 7:01 AM CST / 14-SEP-2018 ***
================


All LIVE production databases are back online as of 3 AM MT. We are continuing to monitor the environments accordingly. However, monitoring all potential issues may not be possible, and it is in those instances that we rely on our Student Solution Community to contact us should you experience a connectivity problem or issue. 


--------------------------------------------------
ADDITIONAL INFORMATION

IMPACTED INSTANCES

LIVE - SIS - environments have been restored.

PROD - TopView Reporting Servers are still offline.

IMPACTED SERVICES
Reports Module - The TopView reports module, Ad hoc reporting tool, and Quick Query (custom fields) will not be operational until the reporting cluster is brought online by our data center. That work is scheduled to start at 8 AM MT.

SIS Home screen - Please note that users may experience issue with the Home screen (e.g. dashboards screen). Users will see a message indicating "The destination in the SIS you are trying to access cannot be reached". This is due to the report server that is still offline and the dashboards run off that reporting server.

AVAILABILITY
PROD - TopView reports module, Ad hoc reporting tool, and Quick Query (custom fields) services will not be available. Updates will be provided as they become available.

ADDITIONAL INFORMATION
We were only able to recover data from 11-SEP-2018 ~8 PM MT. We were unable to recover transaction logs after 11-SEP-2018 ~8 PM MT, therefore any data after this date/time up until the outage that occurred on 12-SEP-2018 ~4:34 PM CST will be lost. 

SYSTEM LIVE PRODUCTION ONLINE
3 AM MT, September 14, 2018

================
*** UPDATE 10:18 PM CST / 13-SEP-2018 ***
================

The anticipated and targeted completion time has lapsed due to setbacks encountered with the restoration process related to the reset and restart of the SQL cluster. While our Technology Management and Data Center teams have made significant progress, the time needed for system restoration has been unforeseen. At present, the SQL Server Database is online however our data base administrators are still working through essential cluster shared components that are still offline. Please be advised, there is no estimated time to have complete systems back online. Updates will be provided as they become available.


================
*** UPDATE 2:20 PM CST / 13-SEP-2018 ***
================

The initial phase of the database restoration is progressing. The data center team has indicated a 25% completion status of the data file system and should complete the remainder in the next 90 minutes. Once the file system is complete, next steps include importing the data back into the restored master databases and to begin the final stages of the full database restore. Please be advised, there still is no estimated guarantee time frame to have complete systems back online, however our target to have all these processes completed is 8:00 PM CST. Updates will be provided as they become available. Please note, setbacks may occur that could change our target completion time.


================
*** UPDATE 10:37 AM CST / 13-SEP-2018 ***
================

Please be advised, the SIS data center and our TM team is working on the restoration of the production servers. Temporarily, while this restoration is taking place, we have added a redirect to our EP website should users attempt to navigate to the TopSchool URL to login. As mentioned, this is just a temporary redirect and once we receive word on the status of this backup restore, we will update this announcement and email the student solution community administrators on next steps.


================

*** UPDATE 3:45 AM CST / 13-SEP-2018 ***

================


The  data center is working to secure additional storage space that will  then be added to the existing database servers. Once this storage made  available, they will begin copying over database backups from 11-SEP-2018, from approximately 8:00 PM MT. The restoration process will  occur shortly thereafter. In parallel, the Technology Management team  is still actively working with the data center to recover the failed  storage device. Updates will be provided as they become available.


================


Service Impact Overview

ALL access into all LIVE - SIS - environments have ceased and are currently unavailable. The Technology Management team is aware and  are actively pursuing an immediate resolution. Updates will be provided  as soon as they become available. 


Environments Affected

Production 


Priority

Critical 


Notification Type

Unplanned


Users Affected

ALL - Student Solution Users & Students utilizing the SIS


Services Affected

ALL production operations in the Student Solution system including other ancillary services (i.e. sharepoint sites, self-service  portals, online admission applications, import utility tool, etc.).


================


Reproduce

User attempts to login to any LIVE environment, self-service portal, portal admission application, sharepoint site, import utility tool, etc.


Expected Results

User should be able to login and utilize all available functions within the SIS, including other ancillary services (i.e. sharepoint sites, self-service portals, online admission applications, import utility tool, etc.).


Actual Results

Web browser will attempt to connect to the server and user will see "Server Error in '/' Application." message indicating an unhandled exception occurred during the execution of the current web request.

 

Workaround

None 


================

For more information please see: 


Product & Service Notifications>>Unplanned Production Outage - 12-SEP-2018


Announcements>>TopView Server Maintenance - 06-SEP-2018


Known Issue>>SS-2944 | TopView SQL Report Server - Database Integrity Issues