家里的服务器突然断电了,GitLab 无法启动、PGSQL崩了,导致 Metabase、Confluence这些都挂了,Docker 设置的自动重启,使得 CPU 温度过高
原本以为是升级了 VMware 16 或者最近更新的 Docker 19.03.13 版本导致的,于是回滚到升级前的备份点,发现错怪他们了
详细日志
GitLab 报错无法启动 :"/var/opt/gitlab/postgresql/.s.PGSQL.5432"
---- Begin output of "bash" "/tmp/chef-script20200921-27-19aio3b" ----, STDOUT: rake aborted!, PG::ConnectionBad: could not connect to server: No such file or directory, Is the server running locally and accepting, connections on Unix domain socket "/var/opt/gitlab/postgresql/.s.PGSQL.5432"?, /opt/gitlab/embedded/service/gitlab-rails/lib/tasks/gitlab/db.rake:53:in block (3 levels) in ', /opt/gitlab/embedded/bin/bundle:23:inload', /opt/gitlab/embedded/bin/bundle:23:in `', Tasks: TOP => gitlab:db:configure, (See full trace by running task with --trace), STDERR: , ---- End output of "bash" "/tmp/chef-script20200921-27-19aio3b" ----, Ran "bash" "/tmp/chef-script20200921-27-19aio3b" returned 1, , Chef Infra Client failed. 9 resources updated in 23 seconds,
查看 PSQL 的日志
解决方法
以 gitlab-psql 用户登入容器内部(注意 root 用户是没用的)
执行命令:pg_resetwal -f /var/opt/gitlab/postgresql/data
成功启动
PostgreSQL 12.3
它断电了是报另外一个错误:
replication checkpoint has wrong magic 0 instead of 307747550
解决方法:
用数据库用户登陆后移除 $PGSQL/pg_logical/replorigin_checkpoint
PostgreSQL INDEX 问题
2020-09-22 02:52:40.632 UTC [347] ERROR: index "job_id_idx" contains unexpected zero page at block 37779
Confluence 报这个错误
用命令 \c confluencedb 进入数据库后执行命令:
REINDEX DATABASE confluencedb;
重建索引后即可正常运行