There are six “channels,” including products, person, short text, currency, documents, and scene, that the user can select from to help the app better understand what it’s looking at and the type of description it provides.
The products channel can help the user find where a UPC barcode is on a product and then uses that to call out what it is. The experimental scene channel is a work in progress, and hopefully Microsoft will be able to compress the one thousand words that a picture is worth into much fewer.
As far as processing, the heavy duty computer vision algorithms that power the app actually run on the cloud, so the app can quickly read out what it’s seeing.