Tags: Amazon Web Services »»»» AWS EC2
It seems that setting up an AWS EC2 instance in the default VPC stands a high chance of being unable to use the AWS CLI from inside the instance. That was my experience, anyway, and the solution is extremely non-obvious, non-intuitive, and requires ensuring that the instance can do outbound HTTPS traffic, and uses the correct public DNS servers.
In my case the requirement was to retrieve a Docker image stored in an ECR (Elastic Container Repository) repository. This of course requires running docker login
using a login token provided by aws ecr get-login-password
. On my laptop this executes immediately. On a newly created AWS EC2 instance that command, as did aws ecr describe-repositories
, took so long I grew frustrated and typed CTRL-C.
Using the AWS CLI on an AWS EC2 instance should work out of the box, you'd think. The AWS CLI certainly shouldn't hang making a simple request like aws ecr describe-repositories
.
In researching a cure for the AWS CLI hanging, I learned that of course the AWS CLI tool makes HTTPS requests to AWS API endpoints. Therefore the EC2 instance required security group, routing tables, and other support for making outbound HTTPS requests. But in my case that wasn't sufficient, and I learned it was necessary to modify the DHCP rules to use the correct DNS servers to resolve ECR domain names.
Setup
I'm using a newly created AWS account, with a newly created IAM account. That account came with a default VPC, and I setup an EC2 instance using the Ubuntu 18.04 server image.
I installed both Docker and the AWS CLI tool on the EC2 instance. I setup the AWS CLI tool with the same profiles that is successfully working on my laptop, the two accounts mentioned in the previous paragraph. Then I tried to run the following to log-in to the ECR instance:
$ aws ecr get-login-password --profile PROFILE-NAME --region REGION | \
docker login --username AWS --password-stdin USER-ID.dkr.ecr.REGION.amazonaws.com
But this took forever, well, at least so long that I grew frustrated and typed CTRL-C. Even running this command took a similarly foreverish amount of time:
$ aws ecr describe-repositories --profile PROFILE-NAME --region REGION
On my laptop both commands return immediately. WTF?
Shouldn't an Amazon service by default be configured so that Amazon services work out of the box? But the default configuration means that Amazon services do not work out of the box.
AWS EC2 servers need support for outbound HTTPS requests
You can run the aws ecr
commands with the --debug
option, and that shows you the HTTP requests and other details. The command hung on making an HTTPS request to an AWS API endpoint. Duh, of course it uses HTTPS for making requests.
On StackOverflow it was pointed out that EC2 instances require outbound rules supporting HTTPS. What that means is both the security group, and the route table, and the network ACL's, all must support outbound HTTPS.
The EC2 was created with a default security group with this outbound configuration:
Type | Protocol | Port range | Destination | Description - optional |
---|---|---|---|---|
All traffic | All | All | 0.0.0.0/0 | - |
Hurm, that sure looks like it supports all outbound traffic. Even updating it to this configuration did not make a difference:
Type | Protocol | Port range | Destination | Description - optional |
---|---|---|---|---|
All traffic | All | All | 0.0.0.0/0 | - |
HTTP | TCP | 80 | 0.0.0.0/0 | - |
HTTPS | TCP | 443 | 0.0.0.0/0 | - |
The AWS CLI still hung. But of course that change shouldn't have made a difference, since the outbound rules already supported all outbound traffic.
The Network ACL for the VPC had this outbound ruleset:
100 ALL Traffic ALL ALL 0.0.0.0/0 ALLOW
* ALL Traffic ALL ALL 0.0.0.0/0 DENY
Again, that allows all outbound traffic.
And the subnet it is attached to has this routing table:
172.31.0.0/16 local
0.0.0.0/0 igw-d3539daa
Meaning that it is correctly connected to an Internet Gateway, and therefore can make outbound requests to the Internet.
Indeed, curl http://www.google.com
and curl https://www.google.com
worked correctly.
That means the EC2 instance was correctly configured out of the box to make outbound HTTPS requests, and this did work correctly.
Enabling the correct DNS settings to access ECR repositories
A
ServerFault question contained a different take on the exact problem I had. The query went through the exact same configuration for AWS VPC and AWS EC2 infrastructure as I had. But the questioner was unable to run yum update
.
The key for him was that the default DHCP configuration:
domain-name = ec2.internal
domain-name-servers = AmazonProvidedDNS
Resulted in the /etc/resolve.conf
to have these contents:
search ec2.internal
nameserver 10.0.0.2
The nameserver in question was insufficient to support the yum update
command, and for me was insufficient to support using the aws ecr
command.
In my case the default DHCP Options Set was configured with:
domain-name = us-west-2.compute.internal;
domain-name-servers = AmazonProvidedDNS;
I created a new DHCP Options Set configured with:
domain-name-servers = 8.8.8.8, 8.8.4.4, 172.16.16.16, 10.10.10.10;
Notice there is no domain-name
setting, just the domain-name-servers
option. The first two DNS servers are ones operated by Google, the second two are internal AWS DNS servers. The 172.16
address is due to the VPC having a CIDR in that network address range.
I then went to the VPC dashboard, selected the default VPC, and chose the EDIT DHCP options set
option, and changed it to use the newly created DHCP Options Set.
After rebooting the EC2 instance, it was able to use AWS CLI to make the aws ecr
requests I gave earlier.
Summary
The fix for this is relatively simple but non-intuitive.
More importantly it begs the question - Why does Amazon supply EC2 configurations that do not support the AWS CLI tool? That's completely astonishing. It doesn't make sense for Amazon to drive their customers insane.