Global and Local Secondary indexes in DynamoDB

By: (plus.google.com) +David Herron; Date: June 8, 2018

Tags: Amazon Web Services » AWS DynamoDB

Years ago, Amazon engineers studying SQL queries learned complex queries with JOINs and whatnot were not widely used, and from that observation developed DynamoDB. DynamoDB is a NoSQL database available in the Amazon cloud service offering. It has some interesting characteristics, one of which is the method to use secondary indexes.

As a NoSQL database, saving an object into DynamoDB causes whatever fields that object has to be stored as columns in the record that's created. Each record in each table can have its own columns.

In SQL databases - you define a "database" and then you define tables within that database. In DynamoDB, you simply define the tables.

Defining a DynamoDB table

When you define a DynamoDB table, you declare a KeySchema attribute describubg the primary index. For example:

{
    "TableDescription": {
        "AttributeDefinitions": [
            {
                "AttributeName": "Artist",
                "AttributeType": "S"
            },
            {
                "AttributeName": "SongTitle",
                "AttributeType": "S"
            }
        ],
        "TableName": "Music",
        "KeySchema": [
            {
                "AttributeName": "Artist",
                "KeyType": "HASH"
            },
            {
                "AttributeName": "SongTitle",
                "KeyType": "RANGE"
            }
        ],
        "ProvisionedThroughput": {
            "ReadCapacityUnits": 10,
            "WriteCapacityUnits": 5
        }
    }
}

If you're using the Serverless Framework, that would be represented as so:

resources:
  Resources:
    DesiredNameOfTable:
      Type: 'AWS::DynamoDB::Table'
      Properties:
        TableName: "Music"
        AttributeDefinitions:
            - AttributeName: "Artist",
              AttributeType: "S"
            - AttributeName: "SongTitle",
              AttributeType: "S"
        KeySchema:
            - AttributeName: "Artist",
              KeyType: "HASH"
            - AttributeName: "SongTitle",
              KeyType: "RANGE"
        ProvisionedThroughput:
            - ReadCapacityUnits: 10
              WriteCapacityUnits: 5

This is part of the data structure used in the (docs.aws.amazon.com) CreateTable DyanmoDB API. The table Music has what's called a composite primary key, meaning the primary key has two attributes.

  • For a simple primary key (partition key), you must provide exactly one element with a KeyType of HASH. This value determines the "partition" holding this data item.
  • For a composite primary key (partition key and sort key), you must provide exactly two elements: The first element must have a KeyType of HASH, and the second element must have a KeyType of RANGE. In this case the partition key works as for the simple key scenario, while the sort key determines the position within the partition containing the data item.

Every row added to a given DynamoDB table must have the columns listed in AttributeDefinitions. These columns all must be associated with a table index -- in the AWS documentation it says AttributeDefinitions is "An array of attributes that describe the key schema for the table and indexes."

The ProvisionedThroughput attribute describes the computing resources to be assigned to this DynamoDB table.

Query and Scan introduction

DynamoDB query and scan operations will fetch up to 1 megabyte of data, so it is important to fit as much relavent data into that limit. What if you want to limit the result set by comparing to a column that is not in the primary key. In SQL we would write

SELECT FROM tableName WHERE columnName LIKE '%something%';

For an SQL database it does not matter whether columnName is in an index or not, you can query against any column. In DynamoDB you can only query against columns appearing in an index.

What you can do is apply a filter against the result set. The filter is applied after the initial query -- your query will retrieve up to 1 megabyte of data, and the filter then reduces the results.

A simple query could be:

{
    "TableName": "Music",
    "KeyConditionExpression": "Artist = :artist",
    "ExpressionAttributeValues": {
        ":artist": { "S": "Steeley Dan" }
    }
}

In other words KeyConditionExpression has a method for substituting values into place-holders, where the place-holder in this case is :artist. The values for the place-holders come from ExpressionAttributeValues.

A composite key query could be:

{
    "TableName": "Music",
    "KeyConditionExpression": "Artist = :artist and SongTitle = :songTitle",
    "ExpressionAttributeValues": {
        ":artist": { "S": "Steeley Dan" },
        ":songTitle": { "S": "Pretzel Logic" }
    }
}

The comparison against the partition key must be an equality comparison. Comparing against the sort key can use any of these comparison operators:

  • a = b — true if the attribute a is equal to the value b
  • a < b — true if a is less than b
  • a <= b — true if a is less than or equal to b
  • a > b — true if a is greater than b
  • a >= b — true if a is greater than or equal to b
  • a BETWEEN b AND c — true if a is greater than or equal to b, and less than or equal to c.
  • begins_with (a, substr) — true if the value of attribute a begins with a particular substring.

Filtering Query and Scan results

A filter expression determines which items within the Query results should be returned to you. All of the other results are discarded.

A query with a filter could be:

{
    "TableName": "Music",
    "KeyConditionExpression": "Artist = :artist",
    "FilterExpression": "SalesRank <= :salesRank",
    "ExpressionAttributeValues": {
        ":artist": { "S": "Steeley Dan" },
        ":salesRank": { "N": "20" }
    }
}

The columns named in a FilterExpression cannot be columns associated with index keys.

The Filter expression is applied after the Query operation but before the data is returned to the caller. The Query operation is limited to 1MB of results. The filter operation can reduce the returned result set.

The filter expression can use any of the comparison operators named earlier.

If there is more than 1MB of data, DynamoDB sets LastEvaluatedKey in the result, and you're expected to make subsequent queries using that key for the starting point. But what if the filter could have limited the results to the 1MB limit if it had been applied with the initial query?

DynamoDB queries versus DocumentClient queries

In specifying the query we have to declare the data type of the values being queried. These match the (docs.aws.amazon.com) "attribute value" system defined in DynamoDB. For example "S" means a String, while "N" means a Number.

This notation is clumsy. In at least the Node.js SDK, there is an alternate client, the DocumentClient object, which simplifies away this necessity to declare the data type.

Using DocumentClient the query might look like:

{
    "TableName": "Music",
    "KeyConditionExpression": "Artist = :artist",
    "FilterExpression": "SalesRank <= :salesRank",
    "ExpressionAttributeValues": {
        ":artist": "Steeley Dan",
        ":salesRank": 20
    }
}

The data type is deduced by DocumentClient.

Local and Global Secondary Indexes

The best DynamoDB has to offer is the secondary indexes: Local Secondary Index, and Global Secondary Index

Both are defined using a KeySchema, and therefore has the same partition key and sort key arrangement. The difference is:

  • Local Secondary Index
    • The partition key in the Local Secondary Index must be the same as the partition key in the primary index for the table
    • These indexes do not have their own ProvisionedThroughput, and therefore use the computing resources assigned to the table
  • Global Secondary Index
    • It can use any column as the partition key (and if desired sort key). These columns must be declared in the AttributeDefinitions.
    • These indexes require their own ProvisionedThroughput

In my first attempt to define a Local Secondary Index, I came up with something like:

resources:
  Resources:
    DesiredNameOfTable:
      Type: 'AWS::DynamoDB::Table'
      Properties:
        TableName: "Music"
        AttributeDefinitions:
            - AttributeName: "Artist",
              AttributeType: "S"
            - AttributeName: "SongTitle",
              AttributeType: "S"
            - AttributeName: "RecordLabel",
              AttributeType: "S"
            - AttributeName: "RecordAlbumName",
              AttributeType: "S"
        KeySchema:
            - AttributeName: "Artist",
              KeyType: "HASH"
            - AttributeName: "SongTitle",
              KeyType: "RANGE"
        ProvisionedThroughput:
            - ReadCapacityUnits: 10
              WriteCapacityUnits: 5
        LocalSecondaryIndexes:
          - IndexName: LabelIndex
            KeySchema: 
              - AttributeName: RecordLabel
                KeyType: HASH
          - IndexName: AlbumIndex
            KeySchema: 
              - AttributeName: RecordAlbumName
                KeyType: HASH

This is using the Serverless Framework representation because it's easier to read. It is of course the same object structure that is handed to DynamoDB. In neither case does the KeySchema for either Local index match the requirements stated earlier.

When attempting to deploy this using the Serverless Framework CLI tool, it gave me error messages like:

An error occurred: DesiredNameOfTable - Property AttributeDefinitions is inconsistent with the KeySchema of the table and the secondary indexes.

An error occurred: DesiredNameOfTable - One or more parameter values were invalid: Index KeySchema does not have a range key for index: RecordLabel

An error occurred: DesiredNameOfTable - Property AttributeDefinitions is inconsistent with the KeySchema of the table and the secondary indexes.

I tried several alternatives each of which gave a different error message.

The key is that the KeySchema was not setup correctly. To use a Local index it would have been required to add the Artist key to the KeySchema in each of the indexes. Depending on your use case this may or may not be useful.

In the case of my application that would not work. Instead my application required a Global Secondary Index.

resources:
  Resources:
    DesiredNameOfTable:
      Type: 'AWS::DynamoDB::Table'
      Properties:
        TableName: "Music"
        AttributeDefinitions:
            - AttributeName: "Artist",
              AttributeType: "S"
            - AttributeName: "SongTitle",
              AttributeType: "S"
            - AttributeName: "RecordLabel",
              AttributeType: "S"
            - AttributeName: "RecordAlbumName",
              AttributeType: "S"
        KeySchema:
            - AttributeName: "Artist",
              KeyType: "HASH"
            - AttributeName: "SongTitle",
              KeyType: "RANGE"
        ProvisionedThroughput:
            - ReadCapacityUnits: 10
              WriteCapacityUnits: 5
        GlobalSecondaryIndexes:
          - IndexName: LabelIndex
            KeySchema: 
              - AttributeName: RecordLabel
                KeyType: HASH
            ProvisionedThroughput:
              ReadCapacityUnits: 1
              WriteCapacityUnits: 1
          - IndexName: AlbumIndex
            KeySchema: 
              - AttributeName: RecordAlbumName
                KeyType: HASH
            ProvisionedThroughput:
              ReadCapacityUnits: 1
              WriteCapacityUnits: 1

Instead the table had to be defined using a Global Secondary Index as shown here.

With a Global index, you pick any column(s) you like as the partition key or sort key (if needed).

« How to get AWS SDK for Node.js to return Promises to use with async/await functions Apple previews iOS 12 »
2016 Election 2018 Elections Acer C720 Ad block Air Filters Air Quality Air Quality Monitoring AkashaCMS Amazon Amazon Kindle Amazon Web Services America Amiga and Jon Pertwee Android Anti-Fascism AntiVirus Software Apple Apple Hardware History Apple iPhone Apple iPhone Hardware April 1st Arduino ARM Compilation Artificial Intelligence Astronomy Astrophotography Asynchronous Programming Authoritarianism Automated Social Posting AWS DynamoDB AWS Lambda Ayo.JS Bells Law Big Brother Big Data Big Finish Big Science Bitcoin Mining Black Holes Blade Runner Blockchain Blogger Blogging Books Botnets Cassette Tapes Cellphones China China Manufacturing Christopher Eccleston Chrome Chrome Apps Chromebook Chromebox ChromeOS CIA CitiCards Citizen Journalism Civil Liberties Climate Change Clinton Cluster Computing Command Line Tools Comment Systems Computer Accessories Computer Hardware Computer Repair Computers Conservatives Cross Compilation Crouton Cryptocurrency Curiosity Rover Currencies Cyber Security Cybermen Cybersecurity Daleks Darth Vader Data backup Data Formats Data Storage Database Database Backup Databases David Tenant DDoS Botnet Department of Defense Department of Justice Detect Adblocker Developers Editors Digital Nomad Digital Photography Diskless Booting Disqus DIY DIY Repair DNP3 Do it yourself Docker Docker MAMP Docker Swarm Doctor Who Doctor Who Paradox Doctor Who Review Drobo Drupal Drupal Themes DVD E-Books E-Readers Early Computers eGPU Election Hacks Electric Bicycles Electric Vehicles Electron Eliminating Jobs for Human Emdebian Encabulators Energy Efficiency Enterprise Node EPUB ESP8266 Ethical Curation Eurovision Event Driven Asynchronous Express Face Recognition Facebook Fake News Fedora VirtualBox Fifth Doctor File transfer without iTunes FireFly Flash Flickr Fraud Freedom of Speech Front-end Development G Suite Gallifrey Gig Economy git Github GitKraken Gitlab GMAIL Google Google Chrome Google Gnome Google+ Government Spying Great Britain Green Transportation Hate Speech Heat Loss Hibernate High Technology Hoax Science Home Automation HTTP Security HTTPS Human ID I2C Protocol Image Analysis Image Conversion Image Processing ImageMagick In-memory Computing InfluxDB Infrared Thermometers Insulation Internet Internet Advertising Internet Law Internet of Things Internet Policy Internet Privacy iOS iOS Devices iPad iPhone iPhone hacking Iron Man iShowU Audio Capture iTunes Janet Fielding Java JavaFX JavaScript JavaScript Injection JDBC John Simms Journalism Joyent Kaspersky Labs Kext Kindle Kindle Marketplace Large Hadron Collider Lets Encrypt LibreOffice Linux Linux Hints Linux Single Board Computers Logging Mac Mini Mac OS Mac OS X MacBook Pro Machine Learning Machine Readable ID Macintosh macOS macOS High Sierra macOS Kext MacOS X setup Make Money Online Make Money with Gigs March For Our Lives MariaDB Mars Mass Violence Matt Lucas MEADS Anti-Missile Mercurial MERN Stack Michele Gomez Micro Apartments Microsoft Military AI Military Hardware Minification Minimized CSS Minimized HTML Minimized JavaScript Missy Mobile Applications Mobile Computers MODBUS Mondas Monetary System MongoDB Mongoose Monty Python MQTT Music Player Music Streaming MySQL NanoPi Nardole NASA Net Neutrality Network Attached Storage Node Web Development Node.js Node.js Database Node.js Performance Node.js Testing Node.JS Web Development Node.x North Korea npm NVIDIA NY Times Online advertising Online Community Online Fraud Online Journalism Online Photography Online Video Open Media Vault Open Source Open Source and Patents Open Source Governance Open Source Licenses Open Source Software OpenAPI OpenJDK OpenVPN Palmtop PDA Patrick Troughton PayPal Paywalls Personal Flight Peter Capaldi Peter Davison Phishing Photography PHP Plex Plex Media Server Political Protest Politics Postal Service Power Control President Trump Privacy Private E-mail server Production use Public Violence Raspberry Pi Raspberry Pi 3 Raspberry Pi Zero ReactJS Recaptcha Recycling Refurbished Computers Remote Desktop Removable Storage Republicans Retro Computing Retro-Technology Reviews RFID Rich Internet Applications Right to Repair River Song Robotics Robots Rocket Ships RSS News Readers rsync Russia Russia Troll Factory Russian Hacking Rust SCADA Scheme Science Fiction SD Cards Search Engine Ranking Season 1 Season 10 Season 11 Security Security Cameras Server-side JavaScript Serverless Framework Servers Shell Scripts Silence Simsimi Skype SmugMug Social Media Social Media Networks Social Media Warfare Social Network Management Social Networks Software Development Software Patents Space Flight Space Ship Reuse Space Ships SpaceX Spear Phishing Spring Spring Boot Spy Satellites SQLite3 SSD Drives SSD upgrade SSH SSH Key SSL Stand For Truth Strange Parts Swagger Synchronizing Files Tegan Jovanka Telescopes Terrorism The Cybermen The Daleks The Master Time-Series Database Tom Baker Torchwood Total Information Awareness Trump Trump Administration Trump Campaign Twitter Ubuntu Udemy UDOO US Department of Defense Video editing Virtual Private Networks VirtualBox VLC VNC VOIP Vue.js Walmart Weapons Systems Web Applications Web Developer Resources Web Development Web Development Tools Web Marketing Webpack Website Advertising Weeping Angels WhatsApp William Hartnell Window Insulation Windows Windows Alternatives Wordpress World Wide Web Yahoo YouTube YouTube Monetization