家里的服务器突然断电了,GitLab 无法启动、PGSQL崩了,导致 Metabase、Confluence这些都挂了,Docker 设置的自动重启,使得 CPU 温度过高
原本以为是升级了 VMware 16 或者最近更新的 Docker 19.03.13 版本导致的,于是回滚到升级前的备份点,发现错怪他们了
详细日志
GitLab 报错无法启动 :"/var/opt/gitlab/postgresql/.s.PGSQL.5432"
---- Begin output of "bash" "/tmp/chef-script20200921-27-19aio3b" ----, STDOUT: rake aborted!, PG::ConnectionBad: could not connect to server: No such file or directory, Is the server running locally and accepting, connections on Unix domain socket "/var/opt/gitlab/postgresql/.s.PGSQL.5432"?, /opt/gitlab/embedded/service/gitlab-rails/lib/tasks/gitlab/db.rake:53:in block (3 levels) in ', /opt/gitlab/embedded/bin/bundle:23:inload', /opt/gitlab/embedded/bin/bundle:23:in `', Tasks: TOP => gitlab:db:configure, (See full trace by running task with --trace), STDERR: , ---- End output of "bash" "/tmp/chef-script20200921-27-19aio3b" ----, Ran "bash" "/tmp/chef-script20200921-27-19aio3b" returned 1, , Chef Infra Client failed. 9 resources updated in 23 seconds,
data:image/s3,"s3://crabby-images/d15fe/d15fe04c6e7aceed95fff039db03eecf2fbdd23f" alt=""
查看 PSQL 的日志
data:image/s3,"s3://crabby-images/e7ed0/e7ed04da3ab438b229cf22910a09ef3e6832bcd2" alt=""
解决方法
以 gitlab-psql 用户登入容器内部(注意 root 用户是没用的)
data:image/s3,"s3://crabby-images/afe36/afe366cea9dead8b711cf7efd3c07d5bb5ee424c" alt=""
执行命令:pg_resetwal -f /var/opt/gitlab/postgresql/data
成功启动
data:image/s3,"s3://crabby-images/6fed4/6fed4d5e9ae3967afddfdf13980d4bf2cd0db2dc" alt="blank"
PostgreSQL 12.3
它断电了是报另外一个错误:
replication checkpoint has wrong magic 0 instead of 307747550
解决方法:
用数据库用户登陆后移除 $PGSQL/pg_logical/replorigin_checkpoint
PostgreSQL INDEX 问题
2020-09-22 02:52:40.632 UTC [347] ERROR: index "job_id_idx" contains unexpected zero page at block 37779
Confluence 报这个错误
用命令 \c confluencedb 进入数据库后执行命令:
REINDEX DATABASE confluencedb;
重建索引后即可正常运行