Building a Google Assistant Controlled Android Things Device

11 min readSep 8, 2017

The Google Assistant and Android Things platforms are two of the most exciting advancements in the world of mobile and Internet of Things (IoT). As Android Things looks to tear down the barriers to IoT entry, the Google Assistant aims to connect users to these new devices in a more natural way. As these platforms continue to advance and updates are released, we are beginning to see Google’s vision; the Assistant on all IoT devices. This means that you will not only be able to ask your oven to turn on from your Google Home, but soon you will be able to ask your oven to do something by speaking to it directly. By leveraging Android Things and the Assistant SDK, you can use Actions to allow your users to communicate with their devices in a natural conversation. Before we go any further, if you have not watched the Google IO talk on the Google Assistant SDK, I highly encourage you to do so.

This blog will cover building your own Assistant powered Android Things device on a Raspberry Pi 3. To do so, we will be leveraging the Assistant gRPC API. At the end of this blog, you will have your own Android Home like device with custom activation phrase support (hands free activation) and have the option of taking your device even further with custom Actions and Intents.

What do we need?

To get started, you will need a few things. First and foremost, you need a Raspberry Pi 3 device with Android Things flashed on it. Second, you need a microphone and speaker. For my purposes, I am using a microphone purchased from Amazon and a standard speaker with 3.5mm headphone jack. Lastly, you will need a bread board and some LED lights. This last part is optional, but really takes your device to the next level.

Getting Started

For the sake of brevity, I will not go into detail on setting up your Android Things device and basic project structure. Such details can be found in step by step instructions here.

Upon flashing your device with Android Things, you will need to then connect the device to your network. Again, step by step instructions can be found here for the initial set up and connecting the device to your WiFi network. This is an important step as it really makes your debugging a breeze.

Next, set up your hardware by plugging your microphone into any open USB space on your Raspberry Pi and then connect your speaker to the 3.5mm headphone jack. Upon doing so, you will need to restart your device for the microphone to be enabled.

Lastly, find the attached sample project for this tutorial’s reference and clone it to your local. Be sure to pay attention to the below items in the build files.

At the time of writing this, Android things preview 4 is the most current release. Be sure to replace the below line with the most up to date dependency.

provided 'com.google.android.things:androidthings:0.4-devpreview'

Since we will be using the gRPC API with bindings generated for us, we need to add a few extra items to our build file.

In your project build file, you need to add the following as a dependency.

classpath "com.google.protobuf:protobuf-gradle-plugin:0.8.0"

Then add the plugin to your application build file.

//this should be defined at top of file. 
apply plugin: 'com.google.protobuf'

Lastly, add the below snippet to your applications build file.

protobuf {
    protoc {
        artifact = 'com.google.protobuf:protoc:3.2.0'
    }
    plugins {
        javalite {
            artifact = "com.google.protobuf:protoc-gen-javalite:3.0.0"
        }
        grpc {
            artifact = 'io.grpc:protoc-gen-grpc-java:1.2.0'
        }
    }
    generateProtoTasks {
        all().each { task ->
            task.plugins {
                javalite {}
                grpc {
                    // Options added to --grpc_out
                    option 'lite'
                }
            }
        }
    }
}

Custom Activation Phrase

With our need to leverage Android Things, we must then use the Assistant gRPC API rather than the Python based Assistant SDK. Since this is just an API, we will need to build our own starting and stopping of our assistant requests. This means that we will be responsible for passing the audio data to (request) and from (response) the API. As a result, we are left with few options for Activation Phrase support. Thankfully, the open-source community at Carnegie Mellon University has given us exactly what we need with their tool Sphinx.

Sphinx has the ability to recognize activation phrases and request recognition. For our use case, we only need to leverage the activation phrase functionality. In the attached example, we have made our activation phrase “Hey CapTech”. Sphinx, comes ready with its own dictionary of words. Since “CapTech” is not found in the dictionary, I will split it into two words that are present in the dictionary “cap” and “tech.” In the attached example, Sphinx resides as an AAR file in the libs folder, however, if you wish to see the full project you can do so here. Note that the project is dependent upon the sync folder in the assets file, which contains the dictionary and acoustic model.

In our class CapTechSphinxManager, we begin by defining our activation phrase of "hey CapTech." This class is also responsible for setting up Sphinx and initializing the activation phrase detection. Upon recognizing the phrase, our CaptechAssistant Activity will be notified and we can begin our Assistant request. This class can be left, pretty much as is short of your custom activation phrase.

public CapTechSphinxManager(Context context, SphinxListener mSphinxListener) {
    this.mSphinxListener = mSphinxListener;    new AsyncTask<Void, Void, Exception>() {
        @Override
        protected Exception doInBackground(Void... params) {
            try {
                Assets assets = new Assets(context);
                File assetsDir = assets.syncAssets();                mSpeechRecognizer = SpeechRecognizerSetup.defaultSetup()
                    .setAcousticModel(new File(assetsDir, "en-us-ptm"))
                    .setDictionary(new File(assetsDir, "cmudict-en-us.dict"))
                    .getRecognizer();                mSpeechRecognizer.addListener(CapTechSphinxManager.this);                // Custom recognizer
                mSpeechRecognizer.addKeyphraseSearch(WAKEUP_SEARCH, ACTIVATION_KEYPHRASE);
                mSpeechRecognizer.addNgramSearch(ACTION_SEARCH, new File(assetsDir, "predefined.lm.bin"));
            } catch (IOException e) {
                return e;
            }
                return null;
            }        @Override
        protected void onPostExecute(Exception result) {
            if (result != null) {
                Log.e(TAG, "Failed to initialize recognizer: " + result);
            } else {
                mSphinxListener.onInitializationComplete();
            }
        }
    }.execute();
}

Our CaptechAssistant is also notified when the speech recognizer is set up. This is important as it takes a little while to load the dictionary supplied with the project. It will be at this time, that we begin listening for our activation phrase and turn on a light so that the user knows the device is ready.

@Override
public void onInitializationComplete() {
    //TODO turn on a light!
    //let our Sphinx Manager know that we want to listen for the phrase
    captechSphinx.startListeningToActivationPhrase();
}
    
@Override
public void onActivationPhraseDetected() {
    Log.d(TAG, "Activation Phrase Detected");
    //TODO start our assistant request    
}

Don’t forget, to use Sphinx, or any audio recording for that matter, we need the following permissions.

<uses-permission android:name="android.permission.WRITE_EXTERNAL_STORAGE"/>
<uses-permission android:name="android.permission.RECORD_AUDIO"/>

Using the Assistant

The Google Assistant API is built on top of gRPC, and bindings can be generated for the language of your choosing. In the attached project, this would be everything in the Proto package. You can find more information on this binding generation here. Fortunately for us, this means we only have to orchestrate the passing of the user's request and the reception of the Assistant API's response.

Google Cloud Project

Before we proceed, we must first set up our Assistant Service. In the attached example, the “credentials” file in the raw folder has been omitted and you must first set up your own credentials to proceed. This file is what will contain the credentials necessary for your Assistant device to make requests and access user specific data. For prototyping, we will generate these credentials from the client_secret file generated after configuring a developer project for your Assistant. To generate these credentials, follow the below steps:

Enable the following activity controls in the Google Account you plan to use with the Assistant: Web & App Activity, Device Information, and Voice & Audio Activity. This will be the acccount you want to allow the Assistant retrieve information such as calendar events from. This does not have to be a developer account.
In the Cloud Platform Console, go to the Projects page. Select an existing project or create a new project.
Enable the Google Assistant API on the project you selected.
Create an OAuth Client ID
Click Other (select Other UI and User data if prompted) and give the client ID a name.
On the OAuth consent screen tab, give the product a name (don’t use “Google” in the name) and a description.
Click ⬇ (at the far right of screen) for the client ID to download the client secret JSON file (clientsecretNNNN.json or client_id.json).
Open a terminal and install the google-auth-lib command line tool:

$ pip install google-auth-oauthlib[tool] --user

Navigate to your top-level project directory.
Use the google-oauthlib-tool command line tool to grant the permission to use the Assistant API to your application and create a new credentials.json file in your app resource directory.

$ cd <project-directory-name>
$ google-oauthlib-tool --client-secrets path/to/client_secret_NNNN.json \
                       --credentials app/src/main/res/raw/credentials.json \
                       --scope https://www.googleapis.com/auth/assistant-sdk-prototype \
                       --save

Replace path/to/client_secret_NNNN.json with the path of the JSON file you downloaded in step 10.

The above steps will output the credentials file required.

Configure the Assistant Service

Finally, we are ready to set up our Assistant Service and begin making our requests.

First, let’s add the necessary permissions:

<uses-permission android:name="android.permission.MODIFY_AUDIO_SETTINGS" />
<uses-permission android:name="android.permission.INTERNET" />

Next, we need to instantiate our service with the credentials we generated via the Google Auth Tool. Once you move past the prototyping phase, you can find details here on using OAuth 2.0 to Access Google APIs.

String ASSISTANT_ENDPOINT = "embeddedassistant.googleapis.com"
ManagedChannel channel = ManagedChannelBuilder.forTarget(ASSISTANT_ENDPOINT).build();
try {
    mAssistantService = EmbeddedAssistantGrpc.newStub(channel)
        .withCallCredentials(MoreCallCredentials.from(Credentials.fromResource(this, R.raw.credentials)
                    ));
} catch (IOException | JSONException e) {
    Log.e(TAG, "error creating assistant service:", e);
}

In the attached example, much of the work is setting up the Audio in and out settings. We will not go into too much detail here about this as much of it is boilerplate code and can be taken as is from the attached example.

Assistant Requests

Once Sphinx recognizes our activation phrase, the rest of the work is just passing the user’s audio data (what the user is saying) to the API and awaiting a response. To do so, we will be leveraging three different Runnable Assistant Requests.

Start Assistant Request — This is where we will connect with the API and set up the Request Observer for Audio Data to be passed as it comes in from the user.

mStartAssistantRequest = () -> {
    Log.d(TAG, "starting assistant request");
    mAudioRecord.startRecording();    mAssistantRequestObserver = mAssistantService.converse(mAssistantResponseObserver);
    ConverseConfig.Builder converseConfigBuilder =
        ConverseConfig.newBuilder()
            .setAudioInConfig(ASSISTANT_AUDIO_REQUEST_CONFIG)
            .setAudioOutConfig(AudioOutConfig.newBuilder()
            .setEncoding(ENCODING_OUTPUT)
            .setSampleRateHertz(SAMPLE_RATE)
            .setVolumePercentage(mAudioTrackVolume)//must do this for the assistant to know it can adjust
            .build());    if (mConversationState != null) {
        Log.d(TAG, "Conversation State Present");
        converseConfigBuilder.setConverseState(
            ConverseState.newBuilder()
                .setConversationState(mConversationState)
                .build());
    }    mAssistantRequestObserver.onNext(
        ConverseRequest.newBuilder()
            .setConfig(converseConfigBuilder.build())
            .build());    //start passing the recording
    mAssistantHandler.post(mStreamAssistantRequest);
    
    //TODO start the blinking lights 
    
};

Stream Assistant Request — This is where we will pass the audio byte data to the Request Observer as the user is speaking.

mStreamAssistantRequest = () -> {
    ByteBuffer audioData = ByteBuffer.allocateDirect(SAMPLE_BLOCK_SIZE);
    int result = mAudioRecord.read(audioData, audioData.capacity(), AudioRecord.READ_BLOCKING);
    if (result < 0) {
        Log.e(TAG, "error reading from audio stream:" + result);
        return;
    }
    
    mAssistantRequestObserver.onNext(ConverseRequest.newBuilder()
        .setAudioIn(ByteString.copyFrom(audioData))
        .build());
        
    //continue passing the recording
    mAssistantHandler.post(mStreamAssistantRequest);
};

Stop Assistant Request — This is where we will clean up our observers, stop recording audio, and play any audio that may have returned from the Assistant API. Upon the stop request firing, we will also let Sphinx know to begin listening for the activation phrase again as it gets turned off after each activation.

mStopAssistantRequest = () -> {
    //the user is done making their request. stop passing data and clean up
    Log.d(TAG, "ending assistant request");
    mAssistantHandler.removeCallbacks(mStreamAssistantRequest);
    if (mAssistantRequestObserver != null) {
        mAssistantRequestObserver.onCompleted();
        mAssistantRequestObserver = null;
    }
    //stop recording the user
    mAudioRecord.stop();    //start telling the user what the Assistant has to say.
    mAudioTrack.play();    //okay we can activate via keyphrase again
    captechSphinxManager.startListeningToActivationPhrase();
    
    //TODO turn off blinking lights and turn on blue "ready" light.
};

Assistant Response

The device will receive the Assistant API responses from the ResponseObserver via the onNext ConverseResponse data. There are various types of conversational responses that the Assistant may respond with that we will need to act upon.

mAssistantResponseObserver =
                new StreamObserver<ConverseResponse>() {
    @Override
    public void onNext(ConverseResponse value) {
        switch (value.getConverseResponseCase()) {
            case EVENT_TYPE:
                Log.d(TAG, "converse response event: " + value.getEventType());
                if(value.getEventType() == ConverseResponse.EventType.END_OF_UTTERANCE)
                    mAssistantHandler.post(mStopAssistantRequest);
                break;
            case RESULT:
                mConversationState = value.getResult().getConversationState();
                Log.d(TAG, value.getResult().toString());
                mAssistantHandler.post(mStopAssistantRequest);
                                
                //this method will take care of if there was a volume request or not.
                adjustVolume(value.getResult().getVolumePercentage());
                break;
            case AUDIO_OUT:
                //the assistant wants to talk!
                final ByteBuffer audioData = ByteBuffer.wrap(value.getAudioOut().getAudioData().toByteArray();
                mAudioTrack.write(audioData, audioData.remaining(), AudioTrack.WRITE_BLOCKING);
                break;
            case ERROR:
                mAssistantHandler.post(mStopAssistantRequest);
                Log.e(TAG, "converse response error: " + value.getError());
                break;
        }
    }    @Override
    public void onError(Throwable t) {
        mAssistantHandler.post(mStopAssistantRequest);
        Log.e(TAG, "converse error:", t);
    }    @Override
    public void onCompleted() {
        //TODO put some lights or something!
        Log.d(TAG, "assistant response finished");    }
};

Conversation State

In order for the user to have a seemingly normal conversation with your device, it will need to maintain a conversation state for reference. This state is then passed along with your Assistant request as extra information in the event that it is needed. This is useful for long running requests for example:

User Request: “What is 5x2?”
Assistant Response: “The answer is 10”
User Request: “Divide that by 10”

In order for the Assistant to respond with “1” from the last user request, the Assistant needs to know the context of the current conversation.

To do this, we first need to save the conversation state from previous Assistant responses:

new StreamObserver<ConverseResponse>() {
    @Override
    public void onNext(ConverseResponse value) {
        switch (value.getConverseResponseCase()) {
            ....
            case RESULT:
            mConversationState = value.getResult().getConversationState();
            ...
        }
    }                    
}

Lastly, we simply need to add it to future start Assistant requests.

mStartAssistantRequest = () -> {
    ...
    ConverseConfig.Builder converseConfigBuilder =
        ConverseConfig.newBuilder()
            .setAudioInConfig(ASSISTANT_AUDIO_REQUEST_CONFIG)
            .setAudioOutConfig(AudioOutConfig.newBuilder()
                .setEncoding(ENCODING_OUTPUT)
                .setSampleRateHertz(SAMPLE_RATE)
                .build());    if (mConversationState != null) {
        converseConfigBuilder.setConverseState(
            ConverseState.newBuilder()
            .setConversationState(mConversationState)
            .build());
    }
    ...
};

Adjusting Volume

There will be times when the user would like to tell the Assistant to manage its own volume. In order for your Assistant device to act upon this request it needs to provide its current volume in Assistant requests and accept any adjustments from the Assistant responses. This works in a similar state as providing conversation state to the request.

First, let’s add the current volume to our Assistant request. To do so, in our Start Assistant Request, when building the ConverseConfig we need to set the current volume as shown below:

ConverseConfig.Builder converseConfigBuilder =
    ConverseConfig.newBuilder()
        .setAudioInConfig(ASSISTANT_AUDIO_REQUEST_CONFIG)
        .setAudioOutConfig(AudioOutConfig.newBuilder()
        .setEncoding(ENCODING_OUTPUT)
        .setSampleRateHertz(SAMPLE_RATE)
        .setVolumePercentage(mAudioTrackVolume)//must do this for the assistant to know it can adjust
        .build());

When the user requests a volume change, this change is returned as a parameter in the ConverseResult of the Assistant Response. Upon receiving it, we simply need to change the devices volume accordingly.

new StreamObserver<ConverseResponse>() {
    @Override
    public void onNext(ConverseResponse value) {
        switch (value.getConverseResponseCase()) {
            ....
            case RESULT:
            adjustVolume(value.getResult().getVolumePercentage());
            ...
        }
    }                    
}private void adjustVolume(int percentage) {
    if (percentage == 0)
        return;    Log.d(TAG, "setting volume to: " + percentage);
    mAudioTrackVolume = percentage;
    float newVolume = AudioTrack.getMaxVolume() * percentage / 100.f;
    mAudioTrack.setVolume(newVolume);
}

Light Indicators

In the attached example, lights are changed based on the current state that the Assistant device is in — blue for ready and red and blue blinking to indicate that your device is listening. To accomplish this, the application leverages the General Purpose Input/Output(GPIO). For the purposes of this blog, we will not cover this in detail, for further details on the subject you can find GPIO documentation here and more examples samples here.

What’s Next?

At this point, you now have your own Android Home like device powered by Android Things. This leaves us with a device that is ready to be taken to the next level. By leveraging Actions with custom Intents, we can add some hardware to our device that can be controlled by simply asking our Assistant. I highly suggest reading my colleagues blog on Google Actions to get started.

The sample code for this article can be found here.

Originally published at www.captechconsulting.com.