Due to the increase of users and the volume of information recorded on our server (particularly device logs accessible from the Share and Deploy Console), our database reached its storage limit.
Feb 15, 2019 - 1:20 PM CET our database detected the approaching storage limit and started an automatic emergency backup. At that point, our Web servers were unable to add additional information in the database. Three minutes after this incident occurred, our IT Team was notified that the servers were no longer operational. The Team quickly determined that the underlying issue was with the database. It took around 20 minutes for the database to finish its automatic emergency backup, and another 20 minutes for the Team to delete old records to free sufficient storage. Meanwhile, we stopped our Web servers until the database was fully functional. Finally, at 1:51 PM CET (after 41 minutes) the system was fully restored with enough free storage to cover several weeks.
Manually deleting device logs older than 12 months freed enough storage to restart our Web servers.
On 18-Feb-2019 we doubled our database storage limit.
In addition, we implemented an automatic deletion process for the removal of device log records older than 12 months.
In early March 2019, we will add new automatic alerts for predefined database storage thresholds. These new alerts will be checked every few months to ensure they are active and sufficient for our needs.