A Renewed Call for App-to-App Interaction APIs

Cow, come in cow

The battle ground of app-to-app interaction history is littered with abandoned ideas, half-solutions, and unimplemented APIs. The current, consumer/provider interaction paradigm for apps and services is a mess of one-off, provider-defined systems that each use their own transaction mechanisms and custom data structures. This makes it hard to do simple things across N providers, like save something to a user’s preferred storage service without jumping through provider-specific code and UX hoops.

I’d like to restart the conversation about bringing legit, app-to-app interaction APIs to the Web. There have been past spec attempts, namely Web Activities and Web Intents, but I’ll argue that while they get a lot right, they all fail to deliver an A+ solution.

Wait, what is an “App-to-App Interaction API”?

Have you ever wanted to open a programmatic connection within your app to the user’s preferred provider of a service or activity? Maybe you’ve wanted to check a user’s identity, store data to a user’s storage provider, or post to the user’s preferred social network, etc. If you’re saying to yourself, “Self, this is already solved, there are endless provider-specific social, identity, and storage includes to do just that” <-- I would submit to you, that right there, sucks. There must be some way to provide a common API layer for consumption of these services that doesn't require developers to know of, integrate, and manage specific bits of UI and code for every provider. Not having legit APIs to do this is a real problem, with significant consequences:

Pages are cluttered with code and UI includes for countless providers, many of which are left on pages to rot after a company discontinues support.
Developers must know of, include, maintain, and write code to juggle how and what providers respond with; which usually doesn’t share a semantic, even across the same type of service.
The performance cost is huge – social, login, and other provider-specific includes account for a staggering amount of memory and dramatically increase page load times.
Because developers are required to specifically include each provider, they often display only two or three options for a given service. This prevents users from leveraging different services that aren’t among those a developer has committed cycles to integrating.

So we know this is a serious issue that impacts countless uses-cases, degrades performance, and limits service mashability, but what does a serious solution look like?

Must-haves of an A+ solution:

Ability for providers to be registered dynamically, or via manifest declaration, but not tied to any installable app model – the whole Web deserves this API.
The user’s selection and preferences for servicing activities should be managed by the User Agent (browser, OS, etc.), within its trusted, chrome UI.
The user should have the ability to pick a service/app to handle an activity per-use, as a default per-domain, or as a global default.
The developer should be able to request handling of an activity via context switch to the provider app OR by establishing a silent, programmatic connection (connecting to another app/service in the background)
The spec should provide both declarative and imperative entry points for UI hand-offs and programmatic connections – for instance: links for declarative hand-offs, and imperative interfaces for background connections.
Imperative API connections to providers should be transactional – meaning a response sent back from a provider should be tied to a specific request instance (hopefully with a Promise).

How does current tech stack up?

Here’s a rundown of how the three existing mechanisms we have for app-to-app interaction fare when assessed against the base features and capabilities an A+ solution should include:

Features	Web Intents	Web Activities	Custom Protocols
Use is not limited to installable app model	âœ”	âŒ	âœ”
Choice of dynamic or declarative registration	âŒ	âŒ	âœ”
Activities types should not be whitelist-driven	âœ”	âŒ	âœ”
Selection/prefs managed in trusted UI	âŒ	âŒ	â•
UI for selection/prefs shouldn't have asstastic UX	âŒ	âŒ	âŒ
Per-use selection	âœ”	âœ”	âœ”
Per-domain default option	âŒ	âŒ	âŒ
Global default option	âœ”	âœ”	âœ”
Global default != granting global permission	âœ”	âœ”	âŒ
Requests to providers are handled transactionally	âœ”	âœ”	âœ”
Handling via context switch or background connection	â•	âŒ	â•
Implemented today by more than one browser vendor	âŒ	âŒ	âœ”
Implemented today in any default, mobile browsers	âŒ	âŒ	âŒ

Though Custom Protocols come closest – and could be enhanced to fill the gaps – none of these options meet the criteria of an A+ solution. To add insult to injury, no current option is supported by default mobile browsers (Android, iOS, or Windows Phone), which makes general use a non-starter for most developers. The inability of these APIs to cover significant developer use-cases is one reason implementation of the above APIs is so spotty. Neither IE nor Edge currently support these APIs, and I don’t blame them – before you commit significant time to implementing an API, you need to be sure it adequately solves the problems you face. That’s part of the reason I am starting this discussion; we want to create an open dialog to better understand your needs.

Experimenting with Protocol Handlers

I needed an app-to-app interaction API for something I’m working on at Microsoft, called Web Profile. The requirement was that any consuming app would be able to open a programmatic connection to a user’s preferred service provider and perform transactions without the consuming app having to know where to look, or include any provider specific code.

To build a working prototype, I needed an existing mechanism that worked across multiple browsers, so I took another look at something I had used in the past: Custom Protocol Handlers (MDN: navigator.registerProtocolHandler). Though they may not realize it, users and developers are already familiar with custom protocol handlers; the best example is probably mailto:. URLs prefixed with a custom protocol direct the browser to open the app or site the user selects to handle the request. This usually takes the form of a contextual switch: opening a new window or browser tab. For what I was looking to do, I needed a way to stay within my app while interacting over a programmatic connection with whatever provider the user selects. Custom protocol handlers can currently be registered in the desktop versions of these browsers:

Firefox 3+
Chrome 13+
Opera 11.6+

Meet `ProtocolWorker`

It turns out you can use custom protocols to create a programmatic, background connection for app-to-app interactions. How? By pairing them with an old friend: the iframe. I found that loading a custom protocol (any handler registered with the web+ prefix) in an iframe will redirect to the handler’s target page. With a bit of postMessage() magic handled by a prollyfill included on both sides (consumer and provider), you can create an active, two-way connection between your app and the user’s preferred service provider. Think of it as a cross between a Web Worker and Background Page (as they’re known in browser extension land). Here’s my Github repo with the JS file you’ll need to make it all work: https://github.com/csuwildcat/ProtocolWorker

Talk codey to me:

To show how this works I put together a simple demo built on the totally-100%-made-up web+whisper protocol. The hypothetical Web Whisper case I highlight in the code below lets a user store a text string (a whisper) with their provider, or retrieve the last text string the provider stored on their behalf. Any consuming app that desires to store or retrieve the user's whisper must first be permitted to do so via the browser's protocol selection UI. The whole demo uses localStorage to save the user's text string on the provider's domain, so don't expect persistence beyond that of your local machine.

Note: though this is a super simple demo, the system would be just as good for sending/retrieving things like: social posts, user profile info, or files from places like OneDrive, Dropbox, Box, etc.

Providers register a protocol handler

The first thing a site offering a Web Whisper service needs to do is register itself as a handler for the web+whisper protocol. Here's how you do that:

navigator.registerProtocolHandler('web+whisper', HANDLER_URL + '?%s', 'Whisper Demo Provider');

Protocol handlers point to a ProtocolWorker page

At the other end of that HANDLER_URL should be an HTML page that is not intended for display - we'll be treating it as more of a DOM-enhanced, Web Worker environment. In this page you will include event listeners and whatever code you need to deal with the requests you'll receive from consuming apps. Here's what the handler page looks like for our Web Whisper provider:

<script src="js/ProtocolWorker.js"></script>

<script>

  window.addEventListener('protocolrequest', function(event){
    if (event.detail['@type'] == 'Comment') {
      localStorage.whisper = event.detail.comment;
      event.detail.resolve();
    }
    if (event.detail['@type'] == 'Search') {
      event.detail.resolve(localStorage.whisper);
    }
  });

</script>

Consuming apps connect to providers via `ProtocolWorker`

After the user adds our Web Whisper provider as a handler for web+whisper requests, consuming apps can ask to open a connection to it. Below is a demo page of what it would look like for a consuming app to interact with the user's Web Whisper provider. Notice the consuming app creates a ProtocolWorker instance and sets up events in which it will attempt to send and retrieve data from the user's Web Whisper provider.

<input placeholder="Whisper something" /><button id="send">Send</button><button id="retrieve">Retrieve</button>

<script src="js/ProtocolWorker.js"></script>
<script>

var retrieveButton = document.querySelector('#retrieve');
var sendButton = document.querySelector('#send');
var input = document.querySelector('input');

var whisperWorker = new ProtocolWorker('web+whisper');

sendButton.addEventListener('click', function(){

  var value = input.value.trim();
  if (value) {
    whisperWorker.request({
      '@context': 'http://schema.org',
      '@type': 'Comment',
      comment: value
    }).then(function(){
      alert('Your message has been whispered to your provider!')
    }).catch(function(){
      alert('You either declined to whisper something to me or don\'t have a web+whisper provider :/')
    });
  }
  else alert('You didn\'t whisper anything.');

});

retrieveButton.addEventListener('click', function(){

  whisperWorker.request({
    '@context': 'http://schema.org',
    '@type': 'Search'
  }).then(function(data){
    alert('You last whispered: ' + data);
  }).catch(function(){
    alert('You either decided not to let this app see your last whisper, or don\'t have a web+whisper provider :/')
  });

});

</script>

Transactions should follow shared semantics

As you may have noticed above, each ProtocolWorker request above is formatted in accordance with a matching data description from Schema.org. Schema.org is a great resource for semantic data descriptors, which I believe are a solid Lingua-franca upon which to transact custom protocol requests. Who knows, maybe browsers could validate and enforce these semantic data structures some day?

Just show me the demo, clown.

I have the demo code above running on a couple Github domains so you can test the interplay between providing and consuming apps – note: you need to be using a desktop version of Firefox, Chrome, or Opera. Here are the steps to follow:

Add the demo provider as your Web Whisper handler by clicking the “Add Provider” button, and accepting the handler via the browser’s permission UI: http://web-protocols.github.io/provider-demo/
Visit the consuming app demo page and whisper something to your provider by entering a message in the input and clicking the send button. It will prompt you to select a provider, make sure you pick the “Whisper Demo Provider” you just added: http://csuwildcat.github.io/ProtocolWorker/
OPTIONAL: You’ll notice an option to make your provider choice the default when the browser selection UI is raised in the consuming app demo. If you select a provider as your default, all subsequent ProtocolWorker calls will happen instantly in the background, without bugging you to choose a handler.

Christian Heilmann was kind enough to shoot a quick screen cast of the steps above:

Where do we go from here?

As we’ve discussed, there are new and existing APIs and specs that could be modified to enable these use-cases – ServiceWorker could be a candidate, but regardless of what form the solution takes, let’s come together as a community of developers and platform implementers to move the web forward. I encourage you to submit your own requirements, technical feedback, and implementation ideas directly to this Github repo: https://github.com/csuwildcat/ProtocolWorker/issues/1

6 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

mgiuca

8 years ago

Hi, I’m from the Google Chrome team and we’re trying to solve a similar problem. Here’s our take on it: http://github.com/chromium/ballista.

My thoughts on your proposal (my own thoughts; I don’t speak for the Chrome team on this): I’m absolutely on board with the premise: we want to stop the madness of N consumers implementing special code for M providers, and instead let the user choose a provider via a generic API.

One use case I really want to solve is the edit case: the consumer (requesting site) sends a document to be edited and in the provider UI, the user hits Ctrl+S and the provider sends back an updated copy of the document to the consumer. Two things that I can’t see working in this API: a) the provider sending back multiple updates of the same document in response to a single request, and b) the provider sending back updates after the user closes the consumer’s tab. Our API proposal is designed to support these two scenarios.

I think using schema.org is a great idea for solving the â€œstandard verbsâ€ problem (to give each verb a well-understood meaning without limiting to a closed set of verbs defined in a web standard).

This is a great problem space to be exploring and I hope we can standardize on a solution. If you’d like to discuss more details, you can shoot me an email (mgiuca@chromium.org).

Author

Daniel Buchner

Reply to mgiuca

Great to hear that you’re working on this! As for the cases you highlighted that my post doesn’t cover: the ProtocolWoker script does have an affordance for serving the use-case you cited. I added a pub/sub-ish mechanism so consumer/requesters can receive multiple messages unbound from explicit 1:1 transactions/Promises. I left it out of the post because I didn’t want to overload the audience. As long as a final API, whatever form it takes, checks off all the items on the list of must-haves, I am comfortable supporting it. Where are you folks at with socializing your ideas and assessing the appetite of other implementers like us (MS), Mozilla, and Apple?

csuwldcat

mgiuca Great to hear that you’re working on this! As for the cases you
highlighted that my post doesn’t cover: the ProtocolWoker script does
have an affordance for serving the use-case you cited. I added a
pub/sub-ish mechanism so consumer/requesters can receive multiple
messages unbound from explicit 1:1 transactions/Promises. I left it out
of the post because I didn’t want to overload the audience. As long as a
final API, whatever form it takes, checks off all the items on the list
of must-haves, I am comfortable supporting it. Where are you folks at
with socializing your ideas and assessing the appetite of other
implementers like us (MS), Mozilla, and Apple?

csuwldcat Ah, got it. I see how the subscribe/push API (in the GitHub polyfill) supports the multi-response use case.

We haven’t properly approached anybody with our proposal yet. Just trying to ascertain initial levels of interest.

Alxandr

FYI; your example websites did not work for me in latest Chrome on Windows 10.

celiu111

7 years ago

People wear watches will give the http://www.swisswatchjust.co.uk/ impression of a moment of strong sense of power, elegant image. Time and strong sense of dealing with people simply make people work more simple, successful.This watch can improve a person’s grades. Just like a small fry favorite with the gold chain, and successful people like to wear watches. Good watches exquisite workmanship, elegant appearance, can display the charm of women and men.This watch is also a sign of http://www.rolexreplicasstore.uk.com/ personal identity, man watch, female bag, said the reason is clear. A small piece of fine watches a few thousand dollars, more than 10000 yuan, able to bring expensive watches are usually very successful in the work of people.