How to fix AWS CLI hangs on AWS EC2 instance

How to fix AWS CLI hangs on AWS EC2 instance

; Date: Wed Mar 18 2020

Tags: Amazon Web Services »»»» AWS EC2

It seems that setting up an AWS EC2 instance in the default VPC stands a high chance of being unable to use the AWS CLI from inside the instance. That was my experience, anyway, and the solution is extremely non-obvious, non-intuitive, and requires ensuring that the instance can do outbound HTTPS traffic, and uses the correct public DNS servers.

In my case the requirement was to retrieve a Docker image stored in an ECR (Elastic Container Repository) repository. This of course requires running docker login using a login token provided by aws ecr get-login-password. On my laptop this executes immediately. On a newly created AWS EC2 instance that command, as did aws ecr describe-repositories, took so long I grew frustrated and typed CTRL-C.

Using the AWS CLI on an AWS EC2 instance should work out of the box, you'd think. The AWS CLI certainly shouldn't hang making a simple request like aws ecr describe-repositories.

In researching a cure for the AWS CLI hanging, I learned that of course the AWS CLI tool makes HTTPS requests to AWS API endpoints. Therefore the EC2 instance required security group, routing tables, and other support for making outbound HTTPS requests. But in my case that wasn't sufficient, and I learned it was necessary to modify the DHCP rules to use the correct DNS servers to resolve ECR domain names.

Setup

I'm using a newly created AWS account, with a newly created IAM account. That account came with a default VPC, and I setup an EC2 instance using the Ubuntu 18.04 server image.

I installed both Docker and the AWS CLI tool on the EC2 instance. I setup the AWS CLI tool with the same profiles that is successfully working on my laptop, the two accounts mentioned in the previous paragraph. Then I tried to run the following to log-in to the ECR instance:

$ aws ecr get-login-password --profile PROFILE-NAME --region REGION | \
    docker login --username AWS --password-stdin USER-ID.dkr.ecr.REGION.amazonaws.com

But this took forever, well, at least so long that I grew frustrated and typed CTRL-C. Even running this command took a similarly foreverish amount of time:

$ aws ecr describe-repositories --profile PROFILE-NAME --region REGION

On my laptop both commands return immediately. WTF?

Shouldn't an Amazon service by default be configured so that Amazon services work out of the box? But the default configuration means that Amazon services do not work out of the box.

AWS EC2 servers need support for outbound HTTPS requests

You can run the aws ecr commands with the --debug option, and that shows you the HTTP requests and other details. The command hung on making an HTTPS request to an AWS API endpoint. Duh, of course it uses HTTPS for making requests.

On (stackoverflow.com) StackOverflow it was pointed out that EC2 instances require outbound rules supporting HTTPS. What that means is both the security group, and the route table, and the network ACL's, all must support outbound HTTPS.

The EC2 was created with a default security group with this outbound configuration:

Type Protocol Port range Destination Description - optional
All traffic All All 0.0.0.0/0 -

Hurm, that sure looks like it supports all outbound traffic. Even updating it to this configuration did not make a difference:

Type Protocol Port range Destination Description - optional
All traffic All All 0.0.0.0/0 -
HTTP TCP 80 0.0.0.0/0 -
HTTPS TCP 443 0.0.0.0/0 -

The AWS CLI still hung. But of course that change shouldn't have made a difference, since the outbound rules already supported all outbound traffic.

The Network ACL for the VPC had this outbound ruleset:

100 ALL Traffic ALL ALL 0.0.0.0/0 ALLOW
  * ALL Traffic ALL ALL 0.0.0.0/0 DENY

Again, that allows all outbound traffic.

And the subnet it is attached to has this routing table:

172.31.0.0/16 local
0.0.0.0/0 igw-d3539daa

Meaning that it is correctly connected to an Internet Gateway, and therefore can make outbound requests to the Internet.

Indeed, curl http://www.google.com and curl https://www.google.com worked correctly.

That means the EC2 instance was correctly configured out of the box to make outbound HTTPS requests, and this did work correctly.

Enabling the correct DNS settings to access ECR repositories

A (serverfault.com) ServerFault question contained a different take on the exact problem I had. The query went through the exact same configuration for AWS VPC and AWS EC2 infrastructure as I had. But the questioner was unable to run yum update.

The key for him was that the default DHCP configuration:

domain-name = ec2.internal
domain-name-servers = AmazonProvidedDNS

Resulted in the /etc/resolve.conf to have these contents:

search ec2.internal
nameserver 10.0.0.2

The nameserver in question was insufficient to support the yum update command, and for me was insufficient to support using the aws ecr command.

In my case the default DHCP Options Set was configured with:

domain-name = us-west-2.compute.internal;
domain-name-servers = AmazonProvidedDNS;

I created a new DHCP Options Set configured with:

domain-name-servers = 8.8.8.8, 8.8.4.4, 172.16.16.16, 10.10.10.10;

Notice there is no domain-name setting, just the domain-name-servers option. The first two DNS servers are ones operated by Google, the second two are internal AWS DNS servers. The 172.16 address is due to the VPC having a CIDR in that network address range.

I then went to the VPC dashboard, selected the default VPC, and chose the EDIT DHCP options set option, and changed it to use the newly created DHCP Options Set.

After rebooting the EC2 instance, it was able to use AWS CLI to make the aws ecr requests I gave earlier.

Summary

The fix for this is relatively simple but non-intuitive.

More importantly it begs the question - Why does Amazon supply EC2 configurations that do not support the AWS CLI tool? That's completely astonishing. It doesn't make sense for Amazon to drive their customers insane.

About the Author(s)

(davidherron.com) David Herron : David Herron is a writer and software engineer focusing on the wise use of technology. He is especially interested in clean energy technologies like solar power, wind power, and electric cars. David worked for nearly 30 years in Silicon Valley on software ranging from electronic mail systems, to video streaming, to the Java programming language, and has published several books on Node.js programming and electric vehicles.

How to fix AWS CLI hangs on AWS EC2 instance