Global and Local Secondary indexes in DynamoDB

Global and Local Secondary indexes in DynamoDB

; Date: June 8, 2018

Tags: Amazon Web Services »»»» AWS DynamoDB

Years ago, Amazon engineers studying SQL queries learned complex queries with JOINs and whatnot were not widely used, and from that observation developed DynamoDB. DynamoDB is a NoSQL database available in the Amazon cloud service offering. It has some interesting characteristics, one of which is the method to use secondary indexes.

As a NoSQL database, saving an object into DynamoDB causes whatever fields that object has to be stored as columns in the record that's created. Each record in each table can have its own columns.

In SQL databases - you define a "database" and then you define tables within that database. In DynamoDB, you simply define the tables.

Defining a DynamoDB table

When you define a DynamoDB table, you declare a KeySchema attribute describubg the primary index. For example:

{
    "TableDescription": {
        "AttributeDefinitions": [
            {
                "AttributeName": "Artist",
                "AttributeType": "S"
            },
            {
                "AttributeName": "SongTitle",
                "AttributeType": "S"
            }
        ],
        "TableName": "Music",
        "KeySchema": [
            {
                "AttributeName": "Artist",
                "KeyType": "HASH"
            },
            {
                "AttributeName": "SongTitle",
                "KeyType": "RANGE"
            }
        ],
        "ProvisionedThroughput": {
            "ReadCapacityUnits": 10,
            "WriteCapacityUnits": 5
        }
    }
}

If you're using the Serverless Framework, that would be represented as so:

resources:
  Resources:
    DesiredNameOfTable:
      Type: 'AWS::DynamoDB::Table'
      Properties:
        TableName: "Music"
        AttributeDefinitions:
            - AttributeName: "Artist",
              AttributeType: "S"
            - AttributeName: "SongTitle",
              AttributeType: "S"
        KeySchema:
            - AttributeName: "Artist",
              KeyType: "HASH"
            - AttributeName: "SongTitle",
              KeyType: "RANGE"
        ProvisionedThroughput:
            - ReadCapacityUnits: 10
              WriteCapacityUnits: 5

This is part of the data structure used in the (docs.aws.amazon.com) CreateTable DyanmoDB API. The table Music has what's called a composite primary key, meaning the primary key has two attributes.

  • For a simple primary key (partition key), you must provide exactly one element with a KeyType of HASH. This value determines the "partition" holding this data item.
  • For a composite primary key (partition key and sort key), you must provide exactly two elements: The first element must have a KeyType of HASH, and the second element must have a KeyType of RANGE. In this case the partition key works as for the simple key scenario, while the sort key determines the position within the partition containing the data item.

Every row added to a given DynamoDB table must have the columns listed in AttributeDefinitions. These columns all must be associated with a table index -- in the AWS documentation it says AttributeDefinitions is "An array of attributes that describe the key schema for the table and indexes."

The ProvisionedThroughput attribute describes the computing resources to be assigned to this DynamoDB table.

Query and Scan introduction

DynamoDB query and scan operations will fetch up to 1 megabyte of data, so it is important to fit as much relavent data into that limit. What if you want to limit the result set by comparing to a column that is not in the primary key. In SQL we would write

SELECT FROM tableName WHERE columnName LIKE '%something%';

For an SQL database it does not matter whether columnName is in an index or not, you can query against any column. In DynamoDB you can only query against columns appearing in an index.

What you can do is apply a filter against the result set. The filter is applied after the initial query -- your query will retrieve up to 1 megabyte of data, and the filter then reduces the results.

A simple query could be:

{
    "TableName": "Music",
    "KeyConditionExpression": "Artist = :artist",
    "ExpressionAttributeValues": {
        ":artist": { "S": "Steeley Dan" }
    }
}

In other words KeyConditionExpression has a method for substituting values into place-holders, where the place-holder in this case is :artist. The values for the place-holders come from ExpressionAttributeValues.

A composite key query could be:

{
    "TableName": "Music",
    "KeyConditionExpression": "Artist = :artist and SongTitle = :songTitle",
    "ExpressionAttributeValues": {
        ":artist": { "S": "Steeley Dan" },
        ":songTitle": { "S": "Pretzel Logic" }
    }
}

The comparison against the partition key must be an equality comparison. Comparing against the sort key can use any of these comparison operators:

  • a = b — true if the attribute a is equal to the value b
  • a < b — true if a is less than b
  • a <= b — true if a is less than or equal to b
  • a > b — true if a is greater than b
  • a >= b — true if a is greater than or equal to b
  • a BETWEEN b AND c — true if a is greater than or equal to b, and less than or equal to c.
  • begins_with (a, substr) — true if the value of attribute a begins with a particular substring.

Filtering Query and Scan results

A filter expression determines which items within the Query results should be returned to you. All of the other results are discarded.

A query with a filter could be:

{
    "TableName": "Music",
    "KeyConditionExpression": "Artist = :artist",
    "FilterExpression": "SalesRank <= :salesRank",
    "ExpressionAttributeValues": {
        ":artist": { "S": "Steeley Dan" },
        ":salesRank": { "N": "20" }
    }
}

The columns named in a FilterExpression cannot be columns associated with index keys.

The Filter expression is applied after the Query operation but before the data is returned to the caller. The Query operation is limited to 1MB of results. The filter operation can reduce the returned result set.

The filter expression can use any of the comparison operators named earlier.

If there is more than 1MB of data, DynamoDB sets LastEvaluatedKey in the result, and you're expected to make subsequent queries using that key for the starting point. But what if the filter could have limited the results to the 1MB limit if it had been applied with the initial query?

DynamoDB queries versus DocumentClient queries

In specifying the query we have to declare the data type of the values being queried. These match the (docs.aws.amazon.com) "attribute value" system defined in DynamoDB. For example "S" means a String, while "N" means a Number.

This notation is clumsy. In at least the Node.js SDK, there is an alternate client, the DocumentClient object, which simplifies away this necessity to declare the data type.

Using DocumentClient the query might look like:

{
    "TableName": "Music",
    "KeyConditionExpression": "Artist = :artist",
    "FilterExpression": "SalesRank <= :salesRank",
    "ExpressionAttributeValues": {
        ":artist": "Steeley Dan",
        ":salesRank": 20
    }
}

The data type is deduced by DocumentClient.

Local and Global Secondary Indexes

The best DynamoDB has to offer is the secondary indexes: Local Secondary Index, and Global Secondary Index

Both are defined using a KeySchema, and therefore has the same partition key and sort key arrangement. The difference is:

  • Local Secondary Index
    • The partition key in the Local Secondary Index must be the same as the partition key in the primary index for the table
    • These indexes do not have their own ProvisionedThroughput, and therefore use the computing resources assigned to the table
  • Global Secondary Index
    • It can use any column as the partition key (and if desired sort key). These columns must be declared in the AttributeDefinitions.
    • These indexes require their own ProvisionedThroughput

In my first attempt to define a Local Secondary Index, I came up with something like:

resources:
  Resources:
    DesiredNameOfTable:
      Type: 'AWS::DynamoDB::Table'
      Properties:
        TableName: "Music"
        AttributeDefinitions:
            - AttributeName: "Artist",
              AttributeType: "S"
            - AttributeName: "SongTitle",
              AttributeType: "S"
            - AttributeName: "RecordLabel",
              AttributeType: "S"
            - AttributeName: "RecordAlbumName",
              AttributeType: "S"
        KeySchema:
            - AttributeName: "Artist",
              KeyType: "HASH"
            - AttributeName: "SongTitle",
              KeyType: "RANGE"
        ProvisionedThroughput:
            - ReadCapacityUnits: 10
              WriteCapacityUnits: 5
        LocalSecondaryIndexes:
          - IndexName: LabelIndex
            KeySchema: 
              - AttributeName: RecordLabel
                KeyType: HASH
          - IndexName: AlbumIndex
            KeySchema: 
              - AttributeName: RecordAlbumName
                KeyType: HASH

This is using the Serverless Framework representation because it's easier to read. It is of course the same object structure that is handed to DynamoDB. In neither case does the KeySchema for either Local index match the requirements stated earlier.

When attempting to deploy this using the Serverless Framework CLI tool, it gave me error messages like:

An error occurred: DesiredNameOfTable - Property AttributeDefinitions is inconsistent with the KeySchema of the table and the secondary indexes.

An error occurred: DesiredNameOfTable - One or more parameter values were invalid: Index KeySchema does not have a range key for index: RecordLabel

An error occurred: DesiredNameOfTable - Property AttributeDefinitions is inconsistent with the KeySchema of the table and the secondary indexes.

I tried several alternatives each of which gave a different error message.

The key is that the KeySchema was not setup correctly. To use a Local index it would have been required to add the Artist key to the KeySchema in each of the indexes. Depending on your use case this may or may not be useful.

In the case of my application that would not work. Instead my application required a Global Secondary Index.

resources:
  Resources:
    DesiredNameOfTable:
      Type: 'AWS::DynamoDB::Table'
      Properties:
        TableName: "Music"
        AttributeDefinitions:
            - AttributeName: "Artist",
              AttributeType: "S"
            - AttributeName: "SongTitle",
              AttributeType: "S"
            - AttributeName: "RecordLabel",
              AttributeType: "S"
            - AttributeName: "RecordAlbumName",
              AttributeType: "S"
        KeySchema:
            - AttributeName: "Artist",
              KeyType: "HASH"
            - AttributeName: "SongTitle",
              KeyType: "RANGE"
        ProvisionedThroughput:
            - ReadCapacityUnits: 10
              WriteCapacityUnits: 5
        GlobalSecondaryIndexes:
          - IndexName: LabelIndex
            KeySchema: 
              - AttributeName: RecordLabel
                KeyType: HASH
            ProvisionedThroughput:
              ReadCapacityUnits: 1
              WriteCapacityUnits: 1
          - IndexName: AlbumIndex
            KeySchema: 
              - AttributeName: RecordAlbumName
                KeyType: HASH
            ProvisionedThroughput:
              ReadCapacityUnits: 1
              WriteCapacityUnits: 1

Instead the table had to be defined using a Global Secondary Index as shown here.

With a Global index, you pick any column(s) you like as the partition key or sort key (if needed).

Global and Local Secondary indexes in DynamoDB