Machine Learning in Action: Build a Universal Translator App for Android with Kotlin
June 20, 2018
8 minute read
I’ve started playing with some of the machine learning APIs that AWS provides. One of these is Amazon Translate — a service that translates from English to a variety of other languages. So I thought I would make an app that uses the on-board speech recognition of an Android app, then translates it to a new language.
What you will need
The requirements for this project are fairly simple. You will need an AWS account (which you can get for free), an Android phone (I had problems recording audio on the emulator so I recommend a real device), and Android Studio (which is a free download).
What we will do
We are going to implement two distinct pieces in this tutorial. Firstly, we are going to create a UI that records some audio and then converts it to text. Then we will send that text to the Amazon Translate service and display the result.
Speech to text
Start by creating a new Android project. Make sure you select an appropriate API level (which will depend on your device). Speech recognition was added in API level 8, so you have plenty of room to use older devices. Ensure you add Kotlin support as well. Select the blank template (Empty activity) to start.
Add the following permissions to the AndroidManifest.xml:
The first one allows us to access the microphone. The other two allow us to send the result to the Internet. By including ACCESS_NETWORK_STATE, you can add the ability to detect whether you are connected to the Internet so that the app doesn’t crash when the network is unavailable.
Now, let’s take a look at the res/layout/activity_main.xml file:
Layouts don’t get much simpler than this. There is a button at the top to activate the recording, and two text boxes — the first will show the text that we heard and the second will show the translated text.
Now let’s take a look at the MainActivity.kt file:
The button to initiate recording is only enabled if speech recognition is available. If that happens, then we call out to the speech recognizer service via an intent. This will pop up a small dialog. At this point, the user speaks. When the user stops speaking, the service will return the text using onActivityResult(). At that point, we update the UI to display the results.
That was so easy to get started! You can actually run this app and see the speech to text working.
There are a couple of caveats here. The most important is that Android calls out to a Google service to complete the process. As of API level 23, there is an additional option for the service called EXTRA_PREFER_OFFLINE that can be used to indicate you prefer to do this offline. Use it like this:
Set up text translation service
Before you can use the text translation service, you need to set it up. I’ve got an AWS CloudFormation template for this purpose, which I install using the AWS command line. First, copy it to an S3 bucket. I have a scratchpad S3 bucket that I use for this sort of thing:
The translate service doesn’t need anything special, but it does need an IAM role to approve the request. The CloudFormation template sets up an unauthenticated IAM role, then associates that unauthenticated IAM role to a newly created Amazon Cognito identity pool. The identity pool will give us temporary credentials to access the Amazon Translate service later on.
Create an AWS connection in the app
The app needs to know where and how to connect to the Amazon Translate API. For this, I created a JSON file in res/raw/aws.json with the information from my CloudFormation stack:
Don’t check in the aws.json file. It contains secrets!
To get the various values, take a look at the Outputs section of the CloudFormation stack. Once the stack is finished, you can use the following command:
This will show you the details for the named stack, which includes the three values that are relatively harder to obtain. The accountId is the 12 digit number for your account — available in the top banner of the AWS console.
To read this file, I’ve got a model:
This is a basic model for the five values we need to configure the AWS Mobile SDK. I’ve added two helper methods for converting from a JSON string to the object, and for reading the JSON string from a resource.
Next, let’s add the AWS Mobile SDK for Android libraries. In the app-level build.gradle file, add the following dependencies:
We’re now ready to configure the AWS translate client. You may have noticed the blank initializeClient() in the earlier code. This is now going to be replaced:
The code creates a client object (which wraps the actual HTTP-based API) and links a credentials provider so that our unauthenticated IAM user credentials are used for the request.
Translate some text
We next need a function to translate the text. Even with the setup we used to get here, it’s remarkably simple to use:
The translateRequest contains just three fields — the source and destination languages and the text to be translated. We are going over the network, so this is run asynchronously using a Future AsyncHandler. When the response is received, the result is the translated text.
In this case, I’ve opted to specify the language as spanish. You can use Arabic (ar), simplified Chinese (zh), French (fr), German (de), Portuguese (pt) and Spanish (es). Just change the targetLanguageCode accordingly. You can also add a settings panel or options list to choose the language.
The only thing left is to write the text to the prepared text view:
The main thing to remember here is that the translateClient runs on a background asynchronous thread. You cannot update the UI on that thread so you have to explicitly switch to the UI thread in the callback.
I hope you enjoyed this foray into mobile machine learning. There are many more capabilities in the Amazon Machine Learning suite, including natural language processing, text-to-speech, image recognition, and custom deep learning capabilities. AWS even has its own speech-to-text service if you don’t feel like sending more data to Google.
Over the last 25 years, I’ve contributed to and written a lot of technical documentation, whether it is in the form of official documentation, blogs, tutorials, or books. Tech writers, who have nothing but access to engineers to work with, turn conversations into a guide that is equally suitable for a complete beginner and an expert. If you are an engineer, you definitely need to institute a “Tech Writer Appreciation Day”.
There has been a lot of discussion on GraphQL. In time, it may rank up there alongside REST as a defining protocol for client-server computing. It is at least trending that way right now. REST has got longevity going for it — years of top engineers thinking about the best ways to structure a REST-based API and thoughts on how to handle it. GraphQL hasn’t got the longevity. It does, however, have senior professional developers who constantly think about APIs and development in order to answer ...
I have a couple of blogs. Firstly, I consolidate all the blogs that I write centrally on my own github.io site, and secondly, I run a weekly link consolidation blog that lets developers like you know what is going on in the world of AWS Amplify. Both of these are currently written using the Jekyll platform using the hosting of their respective source code control platforms, but I want to move them. Firstly, the Github and Gitlab hosting facilities have a limited set of plugins, whereas Jekyll...