[2024-03-09 22:17 UTC]
That was exciting.
Quick recap for posterity, all times UTC:There should be no further disruptions, unless the mysterious CPU spike comes back to haunt us. I've also checked that email is, in fact, functional again -- or, at least gmail accepts mail from the new transmission system, presumably other major ones will too -- so password resets and new signups should be possible for the first time in however long it was broken.
- At 18:00 I disabled the boards at our old hosting solution. I then thought, "Wouldn't it be nice if the frontend were set to serve a nice, clear '503 unavailable' message explaining the downtime?".
- At 19:30 I had managed to set this up, because I really haven't used nginx before and it took way more fiddling with settings than I expected.
- At 20:20 I'd done most of the actual work of uploading/migrating, but the connection between the frontend reverse-proxy and the backend webserver turned out to require more setup than expected, so I formally prolonged the downtime to 21:00.
- At 20:40 the boards were essentially up and running, but while I was at it I wanted to set up new HTTPS certificates using the script I'd created for this.
- At 20:55, as I was restarting with the new certificates, the server melted. I still have no real idea what happened there, and I'll have to dig through the logs for clues when I have the time, but CPU use shot to 100% and the server became unresponsive. I had to cycle power for the 'ocean droplet to wake it up, but there have been no signs of trouble after that.
Then it was just a matter of re-doing the file/database migration (just to be safe), which went fine, and re-acquiring the certs, which went mostly fine save for the catch-22 that the frontend needs the certificates to start while certbot needs the frontend to pass letsencrypt's automated challenges. This could be circumvented by first getting the < *.pastelland.net > certificate (wildcard, checked by dns instead of http), starting the server with that, then running the rest of the checks (which use http to avoid collision with the dns challenges).- Finally, by 21:55 I felt confident re-declaring the board re-opened.
[2024-03-09 22:28 UTC]
Naturally, the first thing that happens after me posting that is the database dies spontaneously, with no apparent error message.
That's ... pretty hard to debug? It's not a nice crash either, some tables in the database are reported as "crashed" when I start it back up. Had to re-post the above message because I re-uploaded the pre-migration DB contents, to be safe.
Not sure how I'll deal with this at the moment. The forum probably shouldn't be run while this problem is potentially in effect, but I don't want to just take it down indefinitely. Just ... don't post anything while I make up my mind.
The last one was, of course, preceded by two other versions of the same message, all claiming to have identified potential solutions and being immediately proven wrong as they were mercifully lost in the database rollback after the next crash. The main highlight was probably the promise of (more frequent) backups (than usual) to ameliorate any future crashes, followed immediately by a backup attempt causing the next crash.[2024-03-10 03:29 UTC]
Alright, third iteration of this message.
Seems we've basically been running into RAM limits, which -- I really should have seen coming, but oh well. I've added some swap space, we'll see tomorrow if that helps.
Statistics: Posted by admin3 — 10 Mar 2024 19:34
Statistics: Posted by admin3 — 06 Mar 2024 23:05
Statistics: Posted by admin3 — 03 Mar 2024 14:54
Statistics: Posted by admin3 — 05 Feb 2024 15:29
Statistics: Posted by Jatsko — 04 Feb 2024 07:07
Statistics: Posted by Jatsko — 24 Dec 2023 16:24
Statistics: Posted by Sublevel 113 — 18 Nov 2023 12:11
Statistics: Posted by Sublevel 113 — 01 Nov 2023 14:13
Statistics: Posted by Sublevel 113 — 25 Oct 2023 22:26
Statistics: Posted by WorldisQuiet5256 — 18 Oct 2023 12:25
Statistics: Posted by Sublevel 113 — 14 Oct 2023 19:23
Statistics: Posted by Sublevel 113 — 13 Oct 2023 23:14