This series of blog posts will share the Google Summer of Code projects that our 2018 cohort of students are working on. Community members are encouraged to join them in creating and testing these exciting new areas of features.
Each project will have their open source repository listed. Please feel free to drop in and say hello, or join in the action with your own issue or code contribution.
Talk to your Rocket.Chat today with newly integrated speech-to-text
Student: Karan Bedi
Mentors: Gabriel Delavald, Pierre Lehnen
Welcome back to our 2018 GSoC blog series! Up next is Karan’s project that will enable you to literally talk to your Rocket.Chat. He developed this speech-to-text integration to connect some of the well-known speech-to-text APIs currently on the market directly with Rocket.Chat. The project also has wider implications for the accessibility of Rocket.Chat in the near future. It integrates open source speech-to-text engines for users with disabilities, as well as those who prefer an on-premise, full-featured installation.
Solving Rocket.Chat Roadies’ #1 problem
One of Rocket.Chat’s long time collaborators spends a good part of every working day on the road and cannot safely send typed messages so often relies on recording audio messages. However, noisy environments often make this difficult and a solution was needed.
Some advanced mobile devices (and desktop PCs) have native speech-to-text technology and can resolve part of the issue. Moving speech-to-text technology so that it resides completely within Rocket.Chat will henceforth allow the solution to be enjoyed by everyone - regardless of the access device.
Karan’s project adds this long-demanded feature to Rocket.Chat. For his project, Karan used the Google Voice API to communicate what the recorded message sent by the user says, the engine then decodes the audio and translates it into text, all within a matter of seconds. The bonus is that Google Voice API can be swapped for other engines like IBM Watson or Bing Speech API because they all work almost the same (you send audio, they return text).
Crafting the user experience - UI/UX improvements
One of the biggest challenges for alterntative input such as speech is the design and crafting of an actually usable user interface. Just translating existing visual designs typically does not result in acceptable user experience.
This project aimed at improving this by configuring the API/engine’s connection and usage attributes, and giving Rocket.Chat users the ability to click a button, record a message, send to the desired API/engine and return the results to the Rocket.Chat editing message box. In short, users will now have a fast, in-platform speech-to-text experience that facilitates user multi-tasking or those who do not wish to type their messages because they are busy doing something else, satisfying the Rocket.Chat roadies’ demand.
Blasting through technical challenges
Speech-to-text is a technology that has gained popularity recently, with some providers like Google, Microsoft and IBM improving their engines and providing some open APIs to translate audio files into textual content. However, since this is pretty new, there are still some challenges when trying to work with different providers, like different encodings and file formats that are expected on different APIs.
Karan’s idea was to provide an easy way to connect different speech-to-text solutions to Rocket.Chat. Currently we have only implemented the speech-to-text feature using the Google Speech API, but it is easily replaceable with other engines such as IBM Watson or Bing Speech API; we will investigate these other solutions at a later date.
Rocket.Chat is carrying out a review to make sure everything works properly but hope to go live with this feature towards the end of September to accompany that release.
A long and winding road
Karan has been an active open source community member within Rocket.Chat even before the official beginning of GSoC. He is the author of many well-received pull requests - landing him esteemed respect from many team members.
But Karan’s journey with us on GSoC 2018 has been rather eventful. He started with our core team on the Global Search project. But due to some of our own internal logistic problems, the project’s scope increased and it became unfeasible for it to continue as a GSoC project.
Thanks to Karan’s tenacity and agility, the crisis was averted. We were able to quickly pair him with his second winning GSoC proposal - speech-to-text APIs. Karan ended up working with two sets of mentors in his short GSoC 2018 term. This exposure to a wide cross-section of the Rocket.Chat team will hopefully facilitate his future endeavours with our open source project.
About the creator: Karan Bedi
Karan Bedi is an undergrad student pursuing a major in Applied Mathematics at IIT Roorkee. In his own words:
“Programming, good food, music and travelling are what get me going. My areas of interests are financial mathematics, numerical analysis, machine learning and natural language processing.”
Karan says that taking part in GSoC 2018 with Rocket.Chat over the summer has been ‘one of the best experiences’ of his life. He wants to continue contributing to open source software and hopes to become an increasingly active member of the OSS community.
Any parting thoughts Karan?
“I sincerely thank the organization members for their continuous support in project and non-project related matters.”
Great news! Thank you for your contribution to Rocket.Chat this summer!