Building a ‘Mental Health Monitoring’ App with AWS AI Services — Part I

A few years back, building interesting web or mobile applications required a lot of effort on implementing back-end logic that did not add lot of visible value to the end product which finally faces the end user. Furthermore, adding data ingestion/processing and, even more, AI related technologies meant spending lot of time setting the right environment, training the right people and ensuring that high efficiency servers were available. Today, relying in managed services like the ones provided by AWS can result in shorter delivery time, higher value and easier maintenance.

As usual, I want to thank first to all the people that collaborated into bringing this to life by contributing with development hours, providing interesting feedback, brainstorming ideas together, and so forth and so on — not to forget anyone is that I don’t write down a detailed list.

Setting up the playground

As a proof of concept and to see how far we can go today with a mix of AWS Serverless Computing, Data Platform and AI Services, we decided to implement a messaging-like app focused on monitoring users ‘mental health’. The central point is to monitor by asking the users to send daily updates on how they feel using both text and audio messages, and lean on AWS services for sentiment analysis to automatically detect when someone is feeling depressed or down during a configurable period of time and might require assistance. The platform should be smart enough to raise immediate alerts when someone needs attention and to provide insights on the long-term evolution of a person.

During Part 1 (this article) we focus on the data ingestion and processing pipeline, we will write about monitoring, near real time alerting and mid/long-term insights in Part 2 coming ahead.

There are three actors in this scenario:

  1. Administrators: They create manager accounts and help training the sentiment recognition services.
  2. Managers: They create and administrate participants groups, by inviting participants to groups. They also monitor participants’ sentiments.
  3. Participants: They send text or audio messages to their subscribed groups. These messages’ are automatically analyzed by the high-scalable AI backbone, and only the group’s manager can review them.

We implemented a simple mobile application using React Native and build a serverless backend using many of the services AWS has to offer. As mentioned before, this article shows the highly-scalable serverless backend for sentiment ingestion and processing, and describes the main services used to achieve it:

  • Kinesis Data Stream (for data ingestion and transformation)
  • Transcribe (speech-2-text for short & long responses)
  • Comprehend and Lex (for sentiment analysis)
  • Step Functions (orchestrator)
  • Lambda (for adding custom processing logic)
  • S3 (as a data lake)

Architecture overview

  • Kinesis Data Stream: This service works as scalable and reliable data sink: It can receive high loads of messages, process them and redirect them as input for other services. This service is a good fit when incoming data might come at any time and size, with variable load, and you want to analyze it near-real time as we did. In our case, we deployed two different Data Stream, one to receive text, other to receive audio messages sent by the Participants and route them to the corresponding processing pipeline.
  • Transcribe: Transcribe provides speech recognition from audio in an easy and convenient way. We used it to convert audio messages to text that can then be passed to the sentiment recognition pipeline later.
  • Comprehend: Amazon Comprehend uses machine learning to extract information from input text, and in our case we used it to extract sentiment from the input. Comprehend accuracy regarding ambiguous or neutral sentiment for some messages can be improved by building and training a custom model. As this surpassed the scope of our proof of concept for now, we decided to add Lex as first filter step in our sentiment classifier pipeline instead.
  • Lex: Lex provides the necessary tools for building conversational interfaces using voice and text. After configuring the intents and corresponding samples, Lex can interpret what the user is asking for (the intent) and even extract information related to those intents (saved in slots). In this project we created intents for the sentiments we want to recognize, but we allow Admin users to train Lex by adding new sample utterances to these intents. This is a ‘hack’ to easily have a model that can go beyond what comprehend provides by default, but as we mentioned before the proper way to tackle it would have been to train a custom model that we could deploy to be used by comprehend.
  • Step functions: Step functions allow to easily combine multiple Lambdas and other AWS services into complex flows that can take use one step output as another step input (configuration is done by defining a JSON file). In our architecture we used them to define the message processing workflows.
  • Lambda: Lambdas enable any arbitrary logic to be uploaded as a function and just run it without the need to set up an environment to execute it. They can be invoked by other AWS services (via triggers) or web requests (via API Gateway), and the output can be used to feed other services. In our scenario, we used Lambdas to kick off the processing Step Functions when data comes through Kinesis, and as handlers on each step that needs to process data and invoke other AWS services (e.g., Transcribe, Lex and Comprehend).
  • S3: Amazon Simple Storage Service (S3) stores data as file objects within buckets. Objects can have associated metadata and access rights can be determined at the object or bucket level. As data saved to S3 can be accessed by simply using an URL, we used it to store blobs (like the audio files) and intermediate data that we used within the processing pipelines.

Connecting everything

There are three main use cases that the we procured to cover with this section of the architecture:

  • Processing text messages to extract sentiment
  • Processing audio messages to extract sentiment
  • Allow training the Lex bot

Text Pipeline — Send and analyze text messages

  1. Participants send their text messages to the their subscribed groups using the mobile app. Then, we use aws-sdk Kinesis library to send the data to Kinesis. This is a high-scalable ingesting pipeline that can scale up as needed to handle thousands of simultaneous concurrent inputs.
  2. Kinesis receives the message and a Lambda function is triggered to process it. We use the Lambda function to start the Sentiment Analysis workflow.
  3. The Sentiment Analysis workflow was developed using a Step Function. The first task consists of an Amazon Lex bot configured to extract the message’s intent. We created four intents related to the sentiments we wanted to track: Positive, Negative, Neutral, Mixed (these match the sentiments that Comprehend can detect). We also added Slot Types to use in tandem with each intent in the intent sample utterances (more on ‘Training the Lex bot’ section below).
  4. When Amazon Lex matches the intent of the message to one of the previously configured options, the message and corresponding intent (representing a sentiment) are saved to DynamoDB for later access from the app, thus ending the text analysis procedure.
  5. When Amazon Lex cannot match an intent, the message is redirected to Amazon Comprehend for further analysis. Comprehend can determine if the sentiment of a message is Positive, Negative, Neutral, Mixed. After the analysis is completed, the message and the result sentiment are stored in DynamoDB, same as before.
  6. An extra step is done for those messages that Comprehend detects as either Neutral or Mixed. In this case, beside storing them in DynamoDB, we save them to an S3 bucket that serves as the data source for the continuous Lex bot training.

There are some samples of our application analyzing text sentiment:

A positive one in this case — ”Today is a good day”

A negative one in this case — “Today is a bad day”

Audio Pipeline — Send and analyze audio messages

  1. The user records an audio on its device and sends it using the same aws-sdk Kinesis library as before. The main change relies on sending the message to a different Kinesis stream than the one used for text messages.
  2. The Kinesis stream for audio messages receives the message and triggers a pre-processing Step Function.
  3. In the pre-processing Step Function, we use Amazon Transcribe to convert the audio to text so that we can re-use the previously explained Text Analysis pipeline.
  4. Afterwards, the text transcription is forwarded to the Text Analysis pipeline just as if it were just a common text message.

Here are some samples of our application analyzing audio sentiment:

A positive one in this case — ”I feel great today

A negative one in this case — “I don’t want to do anything today

Training the Lex bot

In some cases, Comprehend can fail to recognize the actual message’s sentiment: I.e., a seemingly neutral message can hide a positive or negative feeling. To avoid building a custom model, we decided to implement a two step sentiment analysis flow using both Comprehend sentiment analysis and Lex intent recognition capabilities (as shown in the architecture diagram above).

The proposed flow’s goal is to catch certain messages using Lex before they are passed to Comprehend. This happens mainly when an administrator sees that Comprehend cannot determine the correct feeling for a related message, so she can retrain the Lex bot to ‘detect’ the sentiment manually. In such case, the Administrator can access the Bot Training section of the app, determine the sentiment for a message and add keywords that help identify the intent.

In the following image, it is shown the simple flows that can be used to achieve Lex training:

  1. The Admin user accesses the sentiment-ambiguous messages stored in S3 by the Sentiment Analysis flow. As explained before, these messages are the ones that neither Comprehend nor Lex could classify as Positive or Negative.
  2. The Administrator chooses the sentiment and keywords for each message and sends the training data to the backend. The training data contains:
  • The corresponding intent name (Positive, Negative, Neutral).
  • The selected keywords.
  • An utterance sample for built by replacing each keyword in the message with the corresponding slot (this allows for similar messages to be catch by Lex). For example, if the message was “this is a beautiful day”, the Administrator can set the sentiment to Positive and select beautiful as a keyword that matches positive sentiments. The sample utterance will be “this is a {feelingPositive} day” that can be match with other “positive” keywords (e.g, “this is a feeling good day”).

3. In the backend, Lex datasets are updated, adding the utterances samples to the sample list of the selected intent, and the keywords to the Slot Type defined for the corresponding slot (in the example above, beautiful would be added to the Slot Type used by the feelingPositive slot).

These steps complete the training of Lex.

Final thoughts

By making this exercise, we have shown you how simply you can build an architecture able to handle any kind of load for a data ingestion and processing pipeline in near-real time. When a participant submits an input, it is almost instantly processed and shared it back with the owner once analyzed. Furthermore, and we will talk about it in Part 2, that processed piece of information is being stored in a Data Lake in the cloud with the idea of taking more interesting insights, reports, graphs and finally alert a manager when a participant in one of her groups is feeling down (by analyzing long-periods evolution) or very anxious (by analyzing sentiments being streamed within short-periods).

In this case AWS Data and Serverless Computing Platform let us built a complex and powerful architectures with minimal effort. These services provided today by any major cloud provider, become a highly valuable resource as development and maintenance costs, in addition overall Time To Market, can be diminished compared to traditional backend options that required configuring servers’ hardware and software either on premises or in old-fashion clouds. On the other hand, AWS AI Services let us create this simple application that previously would have demanded big teams and lot of AI-specific knowledge. We saw how Lex puts natural language interfaces within reach of small teams with minimal or null machine learning knowledge, and Comprehend let us analyze user input with ease to extract useful information — all out-of-the-box. If you are more interested into the technical details of how to glue all this together, you can take a look at this piece of the architecture in the reference implementation.

As a final take-away, we can say that nowadays with the right team (or fireteam[¹] in this case) and a good knowledge of the services offered by your cloud provider of choice, you can focus on what’s important: The set of features you want to provide to your customer that make your app valuable and unique, one of a kind if you have a great idea! After all, there are no limits to the imagination.

You can continue reading about our journey here.

[¹] SOUTHWORKS FIRETEAM. Instead of assigning individual developers to your project (who are typically juggling multiple projects), we give you a dedicated Fireteam that focuses on one project at a time. These are self-directed, three- person teams (Lead Software Engineer and two Software Engineer) overseen by a Principal Software Engineer.

Originally published by Mauro Krikorian for SOUTHWORKS on Medium 11 December 2020