Sep 28

Chrome Speak To Site; Give any input the power to listen to you

JavaScript, Open Source, Tech, Web Browsing with tags: , , 3 Comments »

Paul Irish gave a fantastic updated State of HTML5 talk at JSConf.EU. It is packed full of demos, including sharks with freaking lazer beams!

At one point he showed off the WebKit support for <input speech> implementation that allows you to talk into an input area. You click on the microphone, speak in, and it will get translated for you with the results. I am not sure if you can tweak how the translation is done (choose a Nuance vs. Google vs. …. solution for example), but it definitely works well out of the box.


I was surprised to see this already landed in my developer-channel Chrome, so I was incented to do something with it on the plane trip back from Berlin to New York City. Something simple would be to give the user the ability to enable speech on any input. I whipped up a Chrome extension using the context menu API, but was quickly surprised to see that there isn’t support in the API to get the DOM node that you are working on. Huh. Kinda crazy in fact.

Then the whizzkid antimatter came to the rescue with his cheeky little hack around the system. Here is how it plays out in the world of this extension:

The background page

First we enable the context menu on any “editable” element (vs. anywhere on the page, on any text, etc), and when clicked we fire off an event to the content script in the given tab:

    title: "Turn on speech input",
    contexts: ["editable"],
    onclick: function(info, tab) {
        chrome.tabs.sendRequest(, 'letmespeak')

Catching in a content script

A content script then does two things:

  • Listens for mousedown events to keep resetting the last element in focus
  • Catches the event, and turns on the speech attribute on the target DOM node
var last_target = null;
document.addEventListener('mousedown', function(event) {
    last_target =;
}, true);
chrome.extension.onRequest.addListener(function(event) {
    last_target.setAttribute("speech", "on");
    last_target = null;

Wire-y wire-y

Of course, it all gets wired up in the manifest:

    "name": "Turn on Speech Input",
    "description": "Turns on the speech attribute, allows you to speak into an input",
    "version": "0.1",
    "permissions": ["contextMenus"],
    "minimum_chrome_version": "6",
    "background_page": "background.html",
    "content_scripts": [{
        "matches": ["<all_urls>"],
        "js": ["input-speech.js"]

This trivial extension is of course on GitHub (I want git-achivements after all! :).

A couple of things trouble me though:

  • The microphone icon should sit on the right of the input, however when dynamically tweaked like this it shows up on the left by mistake [BUG]
  • I have also played with extensions such as Google Scribe. Adding icons like this doesn’t scale. Having them show up all the time gets in my way. I think I want one ability to popup special powers like scribe completion, or speech-to-text, without it getting in my way
  • When services are built into standard elements like this, it feels like I want to have the ability to tweak how they work (with great defaults of course, as 99.9999% of the time they won’t be changed.