Sep 28

Chrome Speak To Site; Give any input the power to listen to you

JavaScript, Open Source, Tech, Web Browsing with tags: , , 3 Comments »

Paul Irish gave a fantastic updated State of HTML5 talk at JSConf.EU. It is packed full of demos, including sharks with freaking lazer beams!

At one point he showed off the WebKit support for <input speech> implementation that allows you to talk into an input area. You click on the microphone, speak in, and it will get translated for you with the results. I am not sure if you can tweak how the translation is done (choose a Nuance vs. Google vs. …. solution for example), but it definitely works well out of the box.

speak-to-site

I was surprised to see this already landed in my developer-channel Chrome, so I was incented to do something with it on the plane trip back from Berlin to New York City. Something simple would be to give the user the ability to enable speech on any input. I whipped up a Chrome extension using the context menu API, but was quickly surprised to see that there isn’t support in the API to get the DOM node that you are working on. Huh. Kinda crazy in fact.

Then the whizzkid antimatter came to the rescue with his cheeky little hack around the system. Here is how it plays out in the world of this extension:

The background page

First we enable the context menu on any “editable” element (vs. anywhere on the page, on any text, etc), and when clicked we fire off an event to the content script in the given tab:

<script>
chrome.contextMenus.create({
    title: "Turn on speech input",
    contexts: ["editable"],
    onclick: function(info, tab) {
        chrome.tabs.sendRequest(tab.id, 'letmespeak')
    }
});
</script>

Catching in a content script

A content script then does two things:

  • Listens for mousedown events to keep resetting the last element in focus
  • Catches the event, and turns on the speech attribute on the target DOM node
var last_target = null;
document.addEventListener('mousedown', function(event) {
    last_target = event.target;
}, true);
 
chrome.extension.onRequest.addListener(function(event) {
    last_target.setAttribute("speech", "on");
    last_target = null;
})

Wire-y wire-y

Of course, it all gets wired up in the manifest:

{
    "name": "Turn on Speech Input",
    "description": "Turns on the speech attribute, allows you to speak into an input",
    "version": "0.1",
    "permissions": ["contextMenus"],
    "minimum_chrome_version": "6",
    "background_page": "background.html",
    "content_scripts": [{
        "matches": ["<all_urls>"],
        "js": ["input-speech.js"]
    }]
}

This trivial extension is of course on GitHub (I want git-achivements after all! :).

A couple of things trouble me though:

  • The microphone icon should sit on the right of the input, however when dynamically tweaked like this it shows up on the left by mistake [BUG]
  • I have also played with extensions such as Google Scribe. Adding icons like this doesn’t scale. Having them show up all the time gets in my way. I think I want one ability to popup special powers like scribe completion, or speech-to-text, without it getting in my way
  • When services are built into standard elements like this, it feels like I want to have the ability to tweak how they work (with great defaults of course, as 99.9999% of the time they won’t be changed.

You?

Dec 16

Chrome Extensions and webOS Applications look quite similar

Tech, webOS with tags: 3 Comments »

appsandextensions

“If you squint, Chrome Extensions and webOS applications look similar” — a wise friend

Having now written webOS applications and Chrome Extensions I have been struck by how similar they are, and could be.

It may seem weird to see similarity in a browser extension mechanism and a mobile application runtime, but when you look at it, there is plenty to share at a high level:

Breaking out of the sandbox

Both worlds need to break out of the web browser sandbox. Extensions can’t be restricted to the same limitations and Chrome Extensions have various API that relate to UI elements in the browser, as well as getting access to more.

On webOS we have exactly the same issue. When building native applications on webOS you can’t have the same restrictions and thus new APIs, UI widgets, and services.

This issue goes beyond even these too worlds, and is one of the big challenges for Web runtimes in the near future. I expect my runtime to be able to do a lot more and to get access to my data not just through third party server side services, but locally too. This all brings us to…

Permissions

The Web needs a permissioning model. The sandbox doesn’t give us enough, and although people quickly talk about security (which is critical) we forget what happens if the user can’t do something through the Web. They often will download an .exe to their local computer and will just run it, not knowing anything about what it does.

A big challenge for us is to come up with a model that doesn’t Do A Vista and drive our users nuts. We have to balance the desire to not ask the user for something all the time, while making sure the user knows what is happening. For power users at least, I favour the idea of showing me what is going on. Even if I grant an application a lot of power, if I could see when and what it was doing it would help. Of course, being able to restrict access to APIs is fantastic, especially when you get to layer social trust on top of the technical security. The Mozilla team talked a lot about just this and I look forward to more of their ideas and implementation.

(You can see an example of how Chrome does permissions with Cross site XHR.)

Application Bundle Info

The permissions and other metadata need to live somewhere. We have a appinfo.json file that has you declare info on your app. Chrome has a manifest.json.

On the Web we don’t really have this. You hit a URL and you bootstrap from there. The HTML5 manifest does tell the browser which files are in scope for caching and the like, but that is about it. Applications and extensions can tell the system much more. You have ids, versions, pointers to update, icons, main launching points or not…

Headless Chickens

You don’t often think of a headless web app. You can think of headless applications and extensions though. How many services are running on your computer now that have no UI? On webOS you can noWindow away and still provide value through the various services that we offer, the obvious being the rich and varied notification styles. Background apps are nicely supported.

Chrome Extensions have the notion of a background page as a way to manage a long lived lifecycle. Your background page is a singleton:

In a typical extension with a background page, the UI — for example, the browser action or page action and any options page — is implemented by dumb views. When the view needs some state, it requests the state from the background page. When the background page notices a state change, the background page tells the views to update.

In webOS and the Mojo framework we have a rich MVC system with a detailed lifecycle as well as the simple ability to define models, views and controllers.

And on

The more I look at these worlds, and others like them (e.g. Jetpack) the more I hope we come together. There are similar needs, and when you look from Chrome Extensions to Chrome OS, you can see a potential progression if done right.

Dec 09

Out of the page and into the runtime; Extensions move the Web development model further

Tech, Web Browsing with tags: No Comments »

The Chrome Extensions team had a press event tonight to go over their extensions beta launch that I unfortunately couldn’t make at the last minute. I wanted to be there to see old friends and the good work they have been doing, and support them.

Aaron, Erik, and the team have done a pretty great job with their extension model. They wanted to allow Web developers to develop extensions. Who better to do that than folks who are a) Web developers and b) have written extensions (e.g. Aaron is an old time JS hacker, author of Greasemonkey, and much more since then [Gears etc]).

Of course, over at Mozilla the same idea had been hatched with Jetpack and other equally talented folk are working on that (Aza and Atul and more).

Some of our best Web engineers are on the case, which is great.

Before these new age extension models the world of extensions was tough. On IE you would be mainly a C++ hacker to do anything. Mozilla really raised the bar with its original Add-Ons but even though you could do a lot in JavaScript, there was still a lot of XPCOM and a huge API set since you basically were given the entire Mozilla platform to work with. This was great from a “you can do whatever the crap you want to do” perspective, but it was awfully hard to dive in and get productive.

Thus, most Web developers stayed inside the browser window. We develop applications in our area, and others do the extension thing.

That has changed and we are now breaking out of the browser sandbox in many ways. This is one of them. When we think about delivering experiences to our users we can think of out of the window. What would our users like to be able to do when not on our web site? How can we package our value adds in a way that they can do their thing at any point? How do we interact with the runtime? What can we do if we have more permissions to do interesting things?

We have only just seen the beginning with Jetpack and Chrome Extensions. They are many more APIs to write and for us to then consume. I personally think that Chrome is too restrictive on what you can do in the browser chrome for example. I would love to play around and have my browser morph in interesting ways. If I am on a particular site, new features and functionality could appear if I want them too.

It feels like we are touching the surface on what the Web platform and runtime is versus what a “browser” is as we have envisioned it until now.

2010 will be an exciting year for Web developers as they add extensions to their toolbox in a nice clean way.

Oct 12

Chrome Win Size; Playing with Chrome extension mechanism

Google with tags: , No Comments »

I have been watching the work of Aaron and the Chrome Extensions team for awhile. I love that with that effort and Jetpack we are going to see extension creation made simpler and more approachable for Web developers.

I realized that I hadn’t written a Chrome extension, so I wanted to try it on for size. I thus created an incredibly simple extension that tells you the size of the window on the fly: Chrome Win Size.

chromewinsize

In an ideal world this extension would be as easy as:

  • Create a bit of HTML for the tooltip
  • Attach an onresize handler to the window that changes the HTML

Unfortunately, it wasn’t quite that easy…. but it was pretty close. The only difference involves the fact that you can’t easily get the window object to resize.

Here is the entire extension to show how simple it is:

Web page for the tool strip

For now, I wanted to put the windows size information in the toolstrip. This isn’t ideal, especially if you don’t have other extensions that turn on the toolstrip. It could be nicer to either: have a subtle grey width/height in the background of the URL bar itself; show the info only when a certain keystroke is pressed. That is all for version 0.2 :)

As you will see below, the extension is just a Web page, and the toolstrip is just a div. You could use class="toolstrip-button" but for now a click doesn’t do anything, so I didn’t use that. Instead I have a title attribute that uses plain English to explain the info.

Once you have the HTML itself, you also have some JavaScript to do the work. The init kicks things off the first time, but the interesting code is the chrome.extension.onConnect.addListener piece that sets up the extension so that a content script can talk to it via postMessage.

<html>
<head>
<script>
 
// Query the current window (anyone will do) and set the window size in the toolstrip
function showDimensions() {
    chrome.windows.getCurrent(function(w) {
        var el = document.getElementById("windowsize");
    	el.innerHTML = w.width + " x " + w.height;
    	el.setAttribute("title", "width: " + w.width + "px, height: " + w.height + "px");
    });
}
 
// Listen to the content script and when told there is a change, query again
chrome.extension.onConnect.addListener(function(port, name) {    
  console.assert(name == "resize");
  port.onMessage.addListener(function() {
      showDimensions();
  });
});
 
</script>
</head>
<body onload="showDimensions()">
  <div id="windowsize"></div>
</body>
</html>

Content Script

Why do we need a content script? Ideally, we would be able to grab a window in the extension and add a resize listener to it and be done. That isn’t the case right now though. In the code above, the window object that we get in w from chrome.windows.getCurrent(function(w) {... isn’t a DOM Window, but just something with a few properties (width, height, etc).

To get an actual DOM window, we need to inject a content script and have that talk to the extension. The code is as simple as below… which does the postMessage() back to the listener that we setup in the extension:

// When the window resizes send a quick message to the extension
window.addEventListener("resize", function() {
	chrome.extension.connect({name: "resize"}).postMessage();
}, false);

Manifest

Now we need to put it all together and that is where the manifest comes in. It is here that we tell Chrome where the files are for the toolstrip and content script, and what permissions they have (e.g. give it the tabs permission), as well as metadata:

{
  "name": "Window Size", 
  "version": "0.1",
  "description": "Displays the size of the main window",
/*  "icons": { "128": "gmail-128x128.png" }, */
  "permissions": [
    "tabs"
  ],
  "toolstrips": [
    "winsize.html"
  ],
  "content_scripts": [
    {
      "matches": ["http://*/*", "https://*/*"],
      "js": ["contentscript.js"]
    }
  ]
}

Caveats

Even with the content script hack, it is pretty easy to do this kind of thing. It would be nice to do more with the window object as the hack has limitations beyond the extra code. For one, if you are on a page such as chrome://extensions/ nothing kicks in (due to the matching). Rather than matching on pages for content scripts to embed, it would be much better to declare that you care about windows themselves.

The toolstrips themselves are a little bulky and ugly, at least on Chrome for Mac (which is very much in beta) and as mentioned earlier…. unless you are using a bunch of extensions, it feels very unChrome-like to use that space.

FYI: To package an extension on Mac/Linux use this handy script as the built-in functionality isn’t there yet.

All in all a decent experience already. Really nice to just work on an HTML document to kick out the functionality. The idealist in me would love to see Jetpack and Chrome Extensions coming together so Web devs had One Way to extend the platform…… The Web is in a fortunate position to have folks like Aaron Boodman, Aza Raskin, Atul Varma and many others on the case of Web extensions.