中文版本
问题
服务器升配后,google cloud 显示vm正常,但是无法远程ssh ,端口都不通
排查ssh问题
后续操作通过gcloud shell操作
#多个区域有服务器最好加上vm zone参数更精确
gcloud compute ssh VM_NAME --zone "europe-west1-b" --troubleshoot
gcloud compute ssh --zone "europe-west1-b" "prod" --troubleshoot
Starting ssh troubleshooting for instance
https://compute.googleapis.com/compute/v1/projects/itrms-77dc/zones/europe-west1-b/instances/prod
in zone europe-west1-b
Start time: 2025-03-05 02:04:04.294375
---- Checking network connectivity ----
The Network Management API is needed to check the VM's network connectivity.
If not already enabled, is it OK to enable it and check the VM's network
connectivity? (Y/n)? y
Enabling service [networkmanagement.googleapis.com] on project
[itrms-77dc]...
Operation
"operations/acat.p2-423416908037-6660e9e1-ef12-4c82-9e9b-e2d0df4724d4"
finished successfully.
API [networkmanagement.googleapis.com] not enabled on project [77dc].
Would you like to enable and retry (this will take a few minutes)? (y/N)? y
Enabling service [networkmanagement.googleapis.com] on project
[77dc]...
Your source IP address is 35.221.195.0
Network Connectivity Test Result: REACHABLE
To view complete details of this test, see
https://console.cloud.google.com/net-intelligence/connectivity/tests/details/ssh-troubleshoot-1n8rm?project=77dc
Help for connectivity tests:
https://cloud.google.com/network-intelligence-center/docs/connectivity-tests/concepts/overview
---- Checking user permissions ----
User permissions: 0 issue(s) found.
---- Checking VPC settings ----
VPC settings: 0 issue(s) found.
---- Checking VM status ----
The Monitoring API is needed to check the VM's Status.
If not already enabled, is it OK to enable it and check the VM's Status?
(Y/n)? y
Enabling service [monitoring.googleapis.com] on project [77dc]...
Operation
"operations/acat.p2-423416908037-a91d8d7d-810e-4768-bffa-205f01a7c408"
finished successfully.
VM status: 0 issue(s) found.
---- Checking VM boot status ----
VM boot: 1 issue(s) found.
The VM may not be running. The serial console logs show the VM has been
unable to complete the boot process. Check your serial console logs to see
if the VM has been dropped into an "emergency shell" or has reached
"Emergency Mode". If that is the case, try restarting the VM to see if the
problem is reproducible.
发现VM boot 状态异常
分析问题
https://cloud.google.com/compute/docs/troubleshooting/fstab-errors?hl=zh-cn#identify_fstab_issues
之前通过/etc/fstab 启动挂架谷歌云硬盘
可能是/etc/fstab 无法正常挂载导致启动失败
解决方案
卸载该系统磁盘
开一台新的救援虚拟机,挂载这个系统磁盘,
修改/etc/fstab 保存
卸载该磁盘挂载回原虚拟机
启动
1. 分离磁盘
https://cloud.google.com/compute/docs/disks/detach-reattach-boot-disk?hl=zh-cn#detach_disk
停止vm然后分离磁盘
gcloud compute instances detach-disk VM_NAME --disk=disk1 --zone europe-west1-b
2. 创建新vm并挂载旧磁盘disk1
创建救援新vm
停止新vm
挂载旧磁盘disk1
gcloud compute instances attach-disk new_VM_NAME --disk=disk1 --zone europe-west1-b
启动新vm
显示为sdb2
lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 10G 0 disk
├─sda1 8:1 0 9.9G 0 part /
├─sda14 8:14 0 3M 0 part
└─sda15 8:15 0 124M 0 part /boot/efi
sdb 8:16 0 50G 0 disk
├─sdb1 8:17 0 200M 0 part
└─sdb2 8:18 0 49.8G 0 part
mkdir -p /test
mount /dev/sdb2 /test
3. 编辑修复/etc/fstab
保留最开始的几行,多余的删除
vi /test//etc/fstab
UUID=61afc323-af4e-4752-9b86-25e0ad7f126e / xfs defaults 0 0
UUID=D57A-D61A /boot/efi vfat defaults,uid=0,gid=0,umask=0077,shortname=winnt 0 0
然后新vm关机
4. 分离磁盘挂载回原vm
因为是启动盘,挂载回去记得加–boot参数
gcloud compute instances detach-disk new_VM_NAME --disk=disk1 --zone europe-west1-b
gcloud compute instances attach-disk VM_NAME --disk=disk1 --zone europe-west1-b --boot
5. 启动vm并删除救援vm
启动原vm正常没问题后,删除救援vm避免产生费用
英文版本
Issue
After upgrading the server, Google Cloud shows that the VM is running normally, but remote SSH is inaccessible, and all ports are unreachable.
Troubleshooting SSH Issues
Subsequent operations are performed via gcloud shell.
Refer to the troubleshooting guide:
Google Cloud SSH Troubleshooting
Run the following command (if there are multiple VM regions, specify the zone for accuracy):
gcloud compute ssh VM_NAME --zone "europe-west1-b" --troubleshoot
Example:
gcloud compute ssh --zone "europe-west1-b" "prod" --troubleshoot
Output:
Starting ssh troubleshooting for instance
https://compute.googleapis.com/compute/v1/projects/itrms-77dc/zones/europe-west1-b/instances/prod
in zone europe-west1-b
Start time: 2025-03-05 02:04:04.294375
---- Checking network connectivity ----
The Network Management API is needed to check the VM's network connectivity.
If not already enabled, is it OK to enable it and check the VM's network
connectivity? (Y/n)? y
Enabling service [networkmanagement.googleapis.com] on project
[itrms-77dc]...
Operation
"operations/acat.p2-423416908037-6660e9e1-ef12-4c82-9e9b-e2d0df4724d4"
finished successfully.
API [networkmanagement.googleapis.com] not enabled on project [77dc].
Would you like to enable and retry (this will take a few minutes)? (y/N)? y
Enabling service [networkmanagement.googleapis.com] on project
[77dc]...
Your source IP address is 35.221.195.0
Network Connectivity Test Result: REACHABLE
To view complete details of this test, see
https://console.cloud.google.com/net-intelligence/connectivity/tests/details/ssh-troubleshoot-1n8rm?project=77dc
Help for connectivity tests:
https://cloud.google.com/network-intelligence-center/docs/connectivity-tests/concepts/overview
---- Checking user permissions ----
User permissions: 0 issue(s) found.
---- Checking VPC settings ----
VPC settings: 0 issue(s) found.
---- Checking VM status ----
The Monitoring API is needed to check the VM's Status.
If not already enabled, is it OK to enable it and check the VM's Status?
(Y/n)? y
Enabling service [monitoring.googleapis.com] on project [77dc]...
Operation
"operations/acat.p2-423416908037-a91d8d7d-810e-4768-bffa-205f01a7c408"
finished successfully.
VM status: 0 issue(s) found.
---- Checking VM boot status ----
VM boot: 1 issue(s) found.
The VM may not be running. The serial console logs show the VM has been
unable to complete the boot process. Check your serial console logs to see
if the VM has been dropped into an "emergency shell" or has reached
"Emergency Mode". If that is the case, try restarting the VM to see if the
problem is reproducible.
Identified Issue
The VM boot status is abnormal.
Analysis
https://cloud.google.com/compute/docs/troubleshooting/fstab-errors?hl=zh-cn#identify_fstab_issues
Previously, Google Cloud disk mounting was done through /etc/fstab
.
The failure to mount correctly might be causing the startup failure.
Solution
- Detach the system disk.
- Create a rescue VM and attach the detached disk.
- Modify
/etc/fstab
and save changes. - Detach the disk from the rescue VM and reattach it to the original VM.
- Restart the original VM.
1. Detach the Disk
Refer to the official guide:
Detach and Reattach a Boot Disk
Stop the VM and detach the disk:
gcloud compute instances detach-disk VM_NAME --disk=disk1 --zone=europe-west1-b
2. Create a New VM and Attach the Old Disk
- Create a new rescue VM.
- Stop the rescue VM.
- Attach the old disk (
disk1
) to the rescue VM:gcloud compute instances attach-disk new_VM_NAME --disk=disk1 --zone=europe-west1-b
- Start the rescue VM.
Check the disk layout:
lsblk
Expected Output:
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 10G 0 disk
├─sda1 8:1 0 9.9G 0 part /
├─sda14 8:14 0 3M 0 part
└─sda15 8:15 0 124M 0 part /boot/efi
sdb 8:16 0 50G 0 disk
├─sdb1 8:17 0 200M 0 part
└─sdb2 8:18 0 49.8G 0 part
- Mount
sdb2
:mkdir -p /test mount /dev/sdb2 /test
3. Edit and Fix /etc/fstab
Keep only the first few lines and remove unnecessary ones:
Edit the file:
vi /test/etc/fstab
Example correct entries:
UUID=61afc323-af4e-4752-9b86-25e0ad7f126e / xfs defaults 0 0
UUID=D57A-D61A /boot/efi vfat defaults,uid=0,gid=0,umask=0077,shortname=winnt 0 0
Save and exit.
4. Detach the Disk and Reattach to the Original VM
Since this is a boot disk, remember to add the --boot
flag when reattaching.
Detach from the rescue VM:
gcloud compute instances detach-disk new_VM_NAME --disk=disk1 --zone=europe-west1-b
Attach back to the original VM:
gcloud compute instances attach-disk VM_NAME --disk=disk1 --zone=europe-west1-b --boot
5. Start the Original VM and Remove the Rescue VM
Start the original VM and confirm everything is working.
Then, delete the rescue VM to avoid additional costs.
This should resolve the boot issue caused by an incorrect /etc/fstab
entry. 🚀
近期评论