googlecloud VM 引导文件/etc/fstab修复

内容目录

中文版本

问题

服务器升配后,google cloud 显示vm正常,但是无法远程ssh ,端口都不通

排查ssh问题

后续操作通过gcloud shell操作

https://cloud.google.com/compute/docs/troubleshooting/troubleshooting-ssh-errors?hl=zh-cn#run_the_troubleshooting_tool

#多个区域有服务器最好加上vm zone参数更精确
gcloud compute ssh VM_NAME --zone "europe-west1-b" --troubleshoot
gcloud compute ssh --zone "europe-west1-b" "prod" --troubleshoot

Starting ssh troubleshooting for instance 
https://compute.googleapis.com/compute/v1/projects/itrms-77dc/zones/europe-west1-b/instances/prod
in zone europe-west1-b
Start time: 2025-03-05 02:04:04.294375

---- Checking network connectivity ----
The Network Management API is needed to check the VM's network connectivity.

If not already enabled, is it OK to enable it and check the VM's network 
connectivity? (Y/n)? y

Enabling service [networkmanagement.googleapis.com] on project 
[itrms-77dc]...
Operation 
"operations/acat.p2-423416908037-6660e9e1-ef12-4c82-9e9b-e2d0df4724d4" 
finished successfully.
API [networkmanagement.googleapis.com] not enabled on project [77dc]. 
Would you like to enable and retry (this will take a few minutes)? (y/N)? y

Enabling service [networkmanagement.googleapis.com] on project 
[77dc]...
Your source IP address is 35.221.195.0

Network Connectivity Test Result: REACHABLE

To view complete details of this test, see 
https://console.cloud.google.com/net-intelligence/connectivity/tests/details/ssh-troubleshoot-1n8rm?project=77dc

Help for connectivity tests:
https://cloud.google.com/network-intelligence-center/docs/connectivity-tests/concepts/overview

---- Checking user permissions ----
User permissions: 0 issue(s) found.

---- Checking VPC settings ----
VPC settings: 0 issue(s) found.

---- Checking VM status ----
The Monitoring API is needed to check the VM's Status.

If not already enabled, is it OK to enable it and check the VM's Status? 
(Y/n)? y

Enabling service [monitoring.googleapis.com] on project [77dc]...
Operation 
"operations/acat.p2-423416908037-a91d8d7d-810e-4768-bffa-205f01a7c408" 
finished successfully.
VM status: 0 issue(s) found.

---- Checking VM boot status ----
VM boot: 1 issue(s) found.

The VM may not be running. The serial console logs show the VM has been 
unable to complete the boot process. Check your serial console logs to see 
if the VM has been dropped into an "emergency shell" or has reached 
"Emergency Mode". If that is the case, try restarting the VM to see if the 
problem is reproducible.

发现VM boot 状态异常

分析问题

https://cloud.google.com/compute/docs/troubleshooting/fstab-errors?hl=zh-cn#identify_fstab_issues
之前通过/etc/fstab 启动挂架谷歌云硬盘
可能是/etc/fstab 无法正常挂载导致启动失败

解决方案

卸载该系统磁盘
开一台新的救援虚拟机,挂载这个系统磁盘,
修改/etc/fstab 保存
卸载该磁盘挂载回原虚拟机
启动

1. 分离磁盘

https://cloud.google.com/compute/docs/disks/detach-reattach-boot-disk?hl=zh-cn#detach_disk
停止vm然后分离磁盘

gcloud compute instances detach-disk VM_NAME --disk=disk1  --zone europe-west1-b 

2. 创建新vm并挂载旧磁盘disk1

创建救援新vm
停止新vm
挂载旧磁盘disk1

gcloud compute instances attach-disk new_VM_NAME --disk=disk1  --zone europe-west1-b  

启动新vm
显示为sdb2

lsblk
NAME    MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda       8:0    0   10G  0 disk
├─sda1    8:1    0  9.9G  0 part /
├─sda14   8:14   0    3M  0 part
└─sda15   8:15   0  124M  0 part /boot/efi
sdb       8:16   0   50G  0 disk
├─sdb1    8:17   0  200M  0 part
└─sdb2    8:18   0 49.8G  0 part 
mkdir -p /test
mount /dev/sdb2 /test

3. 编辑修复/etc/fstab

保留最开始的几行,多余的删除
vi /test//etc/fstab

UUID=61afc323-af4e-4752-9b86-25e0ad7f126e /                       xfs     defaults        0 0
UUID=D57A-D61A          /boot/efi               vfat    defaults,uid=0,gid=0,umask=0077,shortname=winnt 0 0

然后新vm关机

4. 分离磁盘挂载回原vm

因为是启动盘,挂载回去记得加–boot参数

gcloud compute instances detach-disk new_VM_NAME --disk=disk1  --zone europe-west1-b 
gcloud compute instances attach-disk VM_NAME --disk=disk1 --zone europe-west1-b  --boot

5. 启动vm并删除救援vm

启动原vm正常没问题后,删除救援vm避免产生费用

英文版本

Issue

After upgrading the server, Google Cloud shows that the VM is running normally, but remote SSH is inaccessible, and all ports are unreachable.

Troubleshooting SSH Issues

Subsequent operations are performed via gcloud shell.

Refer to the troubleshooting guide:
Google Cloud SSH Troubleshooting

Run the following command (if there are multiple VM regions, specify the zone for accuracy):

gcloud compute ssh VM_NAME --zone "europe-west1-b" --troubleshoot

Example:

gcloud compute ssh --zone "europe-west1-b" "prod" --troubleshoot

Output:

Starting ssh troubleshooting for instance 
https://compute.googleapis.com/compute/v1/projects/itrms-77dc/zones/europe-west1-b/instances/prod
in zone europe-west1-b
Start time: 2025-03-05 02:04:04.294375

---- Checking network connectivity ----
The Network Management API is needed to check the VM's network connectivity.

If not already enabled, is it OK to enable it and check the VM's network 
connectivity? (Y/n)? y

Enabling service [networkmanagement.googleapis.com] on project 
[itrms-77dc]...
Operation 
"operations/acat.p2-423416908037-6660e9e1-ef12-4c82-9e9b-e2d0df4724d4" 
finished successfully.
API [networkmanagement.googleapis.com] not enabled on project [77dc]. 
Would you like to enable and retry (this will take a few minutes)? (y/N)? y

Enabling service [networkmanagement.googleapis.com] on project 
[77dc]...
Your source IP address is 35.221.195.0

Network Connectivity Test Result: REACHABLE

To view complete details of this test, see 
https://console.cloud.google.com/net-intelligence/connectivity/tests/details/ssh-troubleshoot-1n8rm?project=77dc

Help for connectivity tests:
https://cloud.google.com/network-intelligence-center/docs/connectivity-tests/concepts/overview

---- Checking user permissions ----
User permissions: 0 issue(s) found.

---- Checking VPC settings ----
VPC settings: 0 issue(s) found.

---- Checking VM status ----
The Monitoring API is needed to check the VM's Status.

If not already enabled, is it OK to enable it and check the VM's Status? 
(Y/n)? y

Enabling service [monitoring.googleapis.com] on project [77dc]...
Operation 
"operations/acat.p2-423416908037-a91d8d7d-810e-4768-bffa-205f01a7c408" 
finished successfully.
VM status: 0 issue(s) found.

---- Checking VM boot status ----
VM boot: 1 issue(s) found.

The VM may not be running. The serial console logs show the VM has been 
unable to complete the boot process. Check your serial console logs to see 
if the VM has been dropped into an "emergency shell" or has reached 
"Emergency Mode". If that is the case, try restarting the VM to see if the 
problem is reproducible.

Identified Issue

The VM boot status is abnormal.

Analysis

https://cloud.google.com/compute/docs/troubleshooting/fstab-errors?hl=zh-cn#identify_fstab_issues
Previously, Google Cloud disk mounting was done through /etc/fstab.
The failure to mount correctly might be causing the startup failure.

Solution

  1. Detach the system disk.
  2. Create a rescue VM and attach the detached disk.
  3. Modify /etc/fstab and save changes.
  4. Detach the disk from the rescue VM and reattach it to the original VM.
  5. Restart the original VM.

1. Detach the Disk

Refer to the official guide:
Detach and Reattach a Boot Disk

Stop the VM and detach the disk:

gcloud compute instances detach-disk VM_NAME --disk=disk1 --zone=europe-west1-b

2. Create a New VM and Attach the Old Disk

  1. Create a new rescue VM.
  2. Stop the rescue VM.
  3. Attach the old disk (disk1) to the rescue VM:
    gcloud compute instances attach-disk new_VM_NAME --disk=disk1 --zone=europe-west1-b
  4. Start the rescue VM.

Check the disk layout:

lsblk

Expected Output:

NAME    MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda       8:0    0   10G  0 disk
├─sda1    8:1    0  9.9G  0 part /
├─sda14   8:14   0    3M  0 part
└─sda15   8:15   0  124M  0 part /boot/efi
sdb       8:16   0   50G  0 disk
├─sdb1    8:17   0  200M  0 part
└─sdb2    8:18   0 49.8G  0 part
  1. Mount sdb2:
    mkdir -p /test
    mount /dev/sdb2 /test

3. Edit and Fix /etc/fstab

Keep only the first few lines and remove unnecessary ones:

Edit the file:

vi /test/etc/fstab

Example correct entries:

UUID=61afc323-af4e-4752-9b86-25e0ad7f126e /                       xfs     defaults        0 0
UUID=D57A-D61A          /boot/efi               vfat    defaults,uid=0,gid=0,umask=0077,shortname=winnt 0 0

Save and exit.

4. Detach the Disk and Reattach to the Original VM

Since this is a boot disk, remember to add the --boot flag when reattaching.

Detach from the rescue VM:

gcloud compute instances detach-disk new_VM_NAME --disk=disk1 --zone=europe-west1-b

Attach back to the original VM:

gcloud compute instances attach-disk VM_NAME --disk=disk1 --zone=europe-west1-b --boot

5. Start the Original VM and Remove the Rescue VM

Start the original VM and confirm everything is working.
Then, delete the rescue VM to avoid additional costs.


This should resolve the boot issue caused by an incorrect /etc/fstab entry. 🚀

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注