While working on a failed EC2 (ebs backed) instance recently, we were presented with an instance that would not start after reboot or stop/start.
tl;dr: Create a snapshot of the existing EBS vol; remount and edit etc/fstab; re-attach and start the instance.
The only symptom was in the console log:
$ ec2-get-console-output i-nnnnnnnn
init: console-setup main process (63) terminated with status 1
%Ginit: plymouth main process (45) killed by SEGV signal
init: plymouth-splash main process (194) terminated with status 2
cloud-init running: Sat, 29 Jan 2011 23:33:24 +0000. up 2.65 seconds
mountall: Disconnected from Plymouth
It turned out this instance was running as a t1.micro instance, which do not have instance storage on /dev/sdb like all other instance types.
The problem with this is the /etc/fstab entry contained and entry:
/dev/sdb /mnt auto defaults,comment=cloudconfig 0 0
The parameter “nobootwait” is missing!
This was causing the instance to hang on reboot. The solution to this problem is as follows:
Gather instance info for reference
$ ec2-describe-instances i-brokeninst
Create a snapshot of the existing volume attached to the broken instance
$ ec2-create-snapshot vol-brokenvol
$ ec2-create-volume --snapshot snap-fromcreateabove -z us-east-1d
VOLUME vol-newvolume 15 snap-fromcreateabove us-east-1d creating 2011-01-30T00:01:30+0000
Attach this new volume to a temporary host, and edit /etc/fstab from newvol removing the /dev/sdb entry
$ ec2-attach-volume vol-newvol -i i-tempinstance -d /dev/sdX #use a free /dev/sd letter
Note: You can remove the /dev/sdb entry, or use you should also be able to use an entry like:
/dev/sdb /mnt auto defaults,nobootwait 0 0
detach this new volume from the temporary host
$ ec2-detach-volume vol-newvol
detach the original volume from the failed host
$ ec2-detach-volume vol-oldvolume
Attach the newly edited volume to the failed host (use ec2-describe-instances to determine device location)
$ ec2-attach-volume vol-newvol -i i-brokeninstance -d /dev/sda1
Thank you! I have been tearing my hair out over this same problem for months – every time my instance was stopped for maintenance or after I created new images from the volume I would be unable to connect and the server would stop serving it’s web content.
I followed these instructions, repaired the fstab and boom! Finally recovered my instance. Question is, will this happen again and if so, is there a permanent solution?
Thanks again.
Andy
I have same problem,trying to launch instance from snapshot but it is not starting.
Thanks! Saved me an untold amount of time banging my head against the wall.