Developing for Alexa using Elixir (Part Two)

Written by on Apr 19 2017

In part one we looked at Alexa as a service, diving into what makes up a skill a developer can put on the platform. We also looked at one half of the Alexa system, the Skills Interface.

So far we’ve set up the Skills Interface to recognize some of the user’s intents, including slots for objects and locations (in our case). We’ve also supplied a listing of utterances and are able to test the system on some example phrases.

Over to the Back End

When we finished the last post we saw the JSON object Alexa will send to the /Skill Service/.

{
  "session": {
    "sessionId": "SessionId.<sessionId>",
    "application": {
      "applicationId": "amzn1.ask.skill.<reference>"
    },
    "attributes": {},
    "user": {
      "userId": "amzn1.ask.account.<userId>"
    },
    "new": true
  },
  "request": {
  "type": "IntentRequest",
  "requestId": "EdwRequestId.<requestId>",
  "locale": "en-US",
  "timestamp": "2017-01-30T18:09:08Z",
  "intent": {
    "name": "RecordLocationIntent",
    "slots": {
      "Object": {
        "name": "Object",
        "value": "car keys"
        },
      "Location": {
        "name": "Location",
        "value": "kitchen drawer"
        }
      }
    }
  },
  "version": "1.0"
}

How and where we handle this is (mostly) up to us. Amazon recommends you use AWS Lambda for your Skill Service. But, if you don’t want to you can use any server with a valid SSL certificate communicating over HTTPS.

While AWS Lambda is a fine service, we chose to build our service using the Phoenix web framework. Phoenix runs on Elixir, a functional language ideal for this kind of task.

Alexa and Phoenix

There are various packages you can add to an Elixir application to assist with an Alexa skill. We chose Phoenix Alexa. We liked its small code base providing just enough for our needs.

You add Phoenix Alexa to the mix application in the normal way. Add it to both the deps and the application functions in mix.exs.

def application do
  applications: [ ... , :phoenix_alexa]
end

def deps do
  [{:phoenix, "~> 1.2.0"},
  ...
  {:phoenix_alexa, "~> 0.2.0"}]
end

Verifying Alexa and the API Pipeline

We’ll pipe all calls from Alexa through the :api pipeline. It’s set to receive JSON out of the box. However, there’s an additional step to writing an Alexa skill that can be certified which we’ll cover here.

Alexa’s service needs to offer you some guarantee that the messages that reach your endpoint are, indeed, coming from them. Rather than offering you an API token or the like, Amazon signs its messages using an X509 certificate and a private key. In the requests, Amazon takes the body and signs it using its private key. You then have to go fetch their certificate with their public key and ensure that the signature (which they provide as a header) matches the body you received.

Sounds complicated? Well, thankfully for you we’ve written and open-sourced a Plug that manages the whole process for you. So let’s insert add it to our dependencies:

def deps do
  ...
  {:less_verifies_alexa, "~> 0.1.0"}]
end

And add it to our :api pipeline:

pipeline :api do
  plug :accepts, ["json"]
  if Mix.env == :prod do
    plug LessVerifiesAlexa.Plug, application_id: Application.get_env(:phoenix_alexa, :application_id)
  end
end

We grab the :application_id from an environment variable. If there’s something wrong with the request, the Plug will return a status code of 400 and halt the connection. You can find the source for the plug on GitHub, and we’ll take a deep dive into how it works in the third part of this series.

So. If now we should have a call from our application with JSON as its payload. We’ll add a route to the router to handle this incoming call.

scope "/command", OurApplication do
  pipe_through :api
  post "/", AlexaController, :command
end

Pretty standard stuff, pipe it through the :api pipeline we set above. Then dispatch it to the :command action in the AlexaController.

Over to the Controller

Now we need to set up the controller to handle the Intents we’ll receive.

In the controller we make use of Phoenix’s web module. Here, we use the Phoenix Alexa controller module.

defmodule OurApplication.AlexaController do
  use OurApplication.Web, :controller
  use PhoenixAlexa.Controller, :command
  ...

The last line there is what handles the incoming request. That controller defines some functions as overridable,

...
defoverridable [launch_request: 2, intent_request: 3, session_ended_request: 2]
...

So, we can override those functions in our controller to manage responses. The launch_request/2 function handles an initial request (with no intent included). However, we can ignore the session_ended_request/2 since the default one works just fine for our needs.

Let’s ignore the first two functions for now, the real magic happens in the intent_request/3 call.

Handling an Intent Request

Let’s take a look at a sample of our overrides for intent_request/2

def intent_request(conn, "RecordLocationIntent", request) do
  response = Responses.record_location(request)
  do_response(conn, response)
end

def intent_request(conn, "FindObjectIntent", request) do
  response = Responses.retrieve_object_location(request)
  do_response(conn, response)
end

Note the use of pattern matching in the function definition’s second parameter. We are being passed whatever string is in the request.request.intent.name of the JSON payload.

An intent of RecordLocationIntent or FindObjectIntent will match. We pass the request off to another module Responses calling the relevant function therein. When the response returns we use do_response to handle what’s sent back to Alexa.

Imagine a RecordLocationIntent comes in. The controller is going to call Responses.record_location/1.

def record_location(
  %{ request: %{ intent: %{ slots: %{
      "Object" => %{"value" => object_name},
      "Location" => %{"value" => location} } }, },
      session: %{ user: %{ userId: user_id } }
    }
  ) do
    case Objects.get_by(user_id, object_name) do
      :not_found ->
          Objects.record(user_id, object_name, location)

      {:ok, result} ->
        Objects.update(user_id, object_name, location)
    end
    "Ok, we have your #{object_name} in the #{location}"
end

Now, some may balk at the pattern matching going on in the head of the function. Bear in mind we receive the request as a deeply nested struct of structs from the controller. To process the response we need the object_name, the location and the user_id.

We could just dig out the data we need with get_in, but I think there’s something to be said for having a bird’s eye view of the data structure you’re being passed in through pattern matching like this.

Once inside the function we know we have the variables required. There’s no need for conditional logic to check their presence.

The function immediately following this in the module reads,

def record_location(_bad_request), do: "We don't have the info we need to process your request."

Pattern matching FTW! Whichever one matches returns a string outlining the result of the call.

We do the same thing for the FindObjectIntent. When called retrieve_object_location/1 will pattern match the request. This time it’s a little less verbose (as there’s no need for location).

def retrieve_object_location(
  %{ request: %{ intent: %{ slots: %{
      "Object" => %{"value" => object_name} } }, },
      session: %{ user: %{ userId: user_id } }
    }
  ) do
    case Objects.get_by(user_id, object_name) do
      {:not_found} ->
        "I'm sorry, we don't have a location for #{object_name}"
      {:ok, result} ->
        {_user_id, %{location: location}} = result
        "You last put your #{object_name} in the #{location}"
    end
end

Again we add a retrieve_object_location/1 for when the function above doesn’t match.

def retrieve_object_location(_bad_request), do: "We don't have the info we need to process the request."

Anything falling through to the second declaration means we don’t have what we need. We return a descriptive string.

Being Persistent

You may have noticed I’ve used some pretty abstract code in the examples for persistence. How you choose to persist your data is up to you. If using Phoenix you may want to use Ecto. There are other options available.

In our case we wanted to exercise our OTP muscles. We went for a combination of Elixir’s GenServer backed by Erlang’s dets module. We had limited experience with dets, so it was a nice opportunity to further our understanding. There’s a little set up required with supervision and the like, but once it’s in place, it’s a breeze to work with.

dets does have some well documented restrictions in terms of how much data it can work with. But there are ways around those restrictions, either by means of hashing or by swaping out DETS altogether. The advantage of having a single storage interface with Objects is that we can make whatever changes we want in the backend without having to rework any of the controller code.

Back to Alexa

Before I close out I want to jump back to the controller to see the request leave the codebase. Earlier I showed the controller calling do_response/2 when we had a response to the intent.

defp do_response(conn, text) do
  response =
    %Response{}
    |> set_output_speech(%TextOutputSpeech{text: text})
    |> set_should_end_session(true)

  conn |> set_response(response)
end

This is pretty straightforward. We’re using functions given to us by Phoenix Alexa. We populate a %Response{} struct via the set_output_speech and set_should_end_session functions. We then use set_response passing in the conn and the response.

Note: As this is a ‘request and response’ type intent, we end the session. There is nothing more we need to do this time around.

set_response/2 sets the response content type, and encodes the response to JSON for us. Nice.

In the case of a caller telling us where they have left something the response from our server is,

{
  "version": "1.0",
  "response": {
    "outputSpeech": {
      "type": "PlainText",
      "text": "Ok, we have your car keys in the kitchen drawer"
    },
    "shouldEndSession": true
  },
  "sessionAttributes": {}
}

Alexa will say the text to the caller and we can now respond to them when they as for where they left the car keys.

Hopefully this gives you some insight into setting up a Skill with Alexa. We enjoyed putting the skill together and leveraging some of Elixir’s power in the process.

Please drop us a line or leave a comment if you want to ask us anything. We’d love to hear about your adventures with Alexa and how your skills are coming along!

Meet
Eugen

Hi I'm Eugen,

I wrote the article you're reading... I spend my days in the eclectic city of Bucharest, Romania. I enjoy talking things through, and solving hard problems with a gentle touch. I love films, friendships, and tennis and I have an insatiable sweet tooth.

Get Blog Updates