HTC Vive is a tech breakthrough

HTC Vive is a tech breakthrough

We have some HTC Vive VR systems at the Department of Design at the Aachen University of Applied Science. This system is a game changer for consumer grade Virtual Reality applications. I have to admit that I am awed by the engineering feat that Alan Yates and Valve have pulled off with the HTC Vive. I was not aware of the details when I first used the system.

Here is a schematic animation that illustrates the principle:

The whole idea circles around the translation of time to angles by measuring the duration between two IR signals (first one by an IR flash and one by rotating lasers that sweep the room afterwards). You can see the system that sends the IR signals in slow motion here:

The whole thing won’t work without the time measure being extremely precise and the rotation extremely constant. We are talking about measuring 120 times per second how much time elapses when something moves a millimeter at over 1.900 m/s — that is 6.840 km/h! And the 60 rotations per second spinning mirror heads are corrected to stay within 100 nanometers per turn (a human hair is about 18.000-80.000 nanometers!).

This is ridiculous per se — but completely insane to do this with consumer grade hardware for a couple of hundred dollars. Yates and his team had to invent own ways to produce the parts at the required specifications. They also had to create software that compensates physical effects, material inconsistencies and drift correction during operation.

Because the time measuring is done locally at the sensors that receive the light pulses and that are attached to the tracked objects, the system practically can simultaneously track any number of objects. The whole approach is a breakthrough in »positional tracking« and is not limited to VR applications. And there is still headspace left to develop it further to achieve better precision and wider ranges.

For the tech savvy here is a talk by Alan Yates discussing the obstacles they had to overcome:


(Via Dynamic Information Design Seminar Blog)

Minority Report science adviser and inventor John Underkoffler demos g-speak — the real-life version of the film’s eye-popping, tai chi-meets-cyberspace computer interface. Is this how tomorrow’s computers will be controlled?

G-Speak is a really interesting concept. Right now I do not feel it is where it should be to be adopted on a broader scale: You need a certain environment with at least 2-3 square meters of space in front of a quite large screen.

I wonder if Microsoft will offer a extension to its Project Natal sensor some day — so that voice commands, body language and hand gestures create an immersive UI.

I can imagine that one day displays will cover complete walls so that you get a pretty cave-like situation. It is maybe time for another Display seminar?

When faces become hyperlinks

The algorithms for facial recognition have improved a lot in recent years. Here is a company showing a working prototype of a mobile app that recognizes faces and attaches links to social network layers to them:

The prototype was shown last year — but there was a live demo at the Mobile World Congress in Barcelone last week. Obviously the company that also created the service wants to turn this in to a real application.

The implications of this is shown in the video: when looked through the “eyes of the app” people virtually carry logos, brands, name tags and messages around.

10/GUI & con10uum

10/GUI (by Clayton Miller) is an novel approach to human-computer interaction. But it draws attention to the fine line designers will need to walk to effectively create physical human-computer interactions.

The video demonstrates the potential advantages of navigating within a desktop interface with up to ten fingers, rather than via a single cursor:

[More on]

Mouse 2.0?

These research prototypes for future computer mice by Microsoft are very interesting approaches:

GPS + Compass + Motion sensors = Augmented Reality

The new iPhone 3GS adds a compass to the set of sensors. Combined with the GPS, the motion detection sensor and some image change detection via the internal video camera, this enables a new breed of “augmented reality” applications.

NearestWiki for example displays WikiPedia entries about buildings and places in the vicinity.

NearestWiki is not the first augmented reality app for the iPhone, but it is the first that is not tied to a specific region or city (like Metro Paris)

Next versions of the iPhone may feature more precise sensors and a lower latency – giving a much better feeling (e.g. labels not jumping around in the scenery).

Thoughts on Google Wave

Thoughts on Google Wave

I am just collecting some thoughts about some observations and issues – while I am trying to understand Google Wave.


(see a demo on their site)

Google Wave is an integrated set of technologies (with protocols that allow semi-synchronous editing of outlines and their federation across several servers). With this approach Google Wave solves some difficult technical and infrastructural problems.

But it also generates some new problems that need to be solved to make Google Wave a success — otherwise I think users will not adopt the system (which in case of Google will set the seal on this project I suppose!).

1. Misty horizon

Google Wave is a frameworked solution for things people did not ask for and communication processes that no one is practicing yet (but no one has really “asked” for the mouse as input device either!). It is hard to see where Google Wave is going to be. This breeds creation, but it also challenges the non-developer. There will be best practices, but it will take a lot of time to identify use cases that people can learn “to wave” with.

So Google Wave challenges the imagination – and few people will be able to answer the “What is it all about?” question easily. The horizon is schrouded in mist.

Possible approach: A potential solution to this is to start with guided tours (a LOT of them) showing very common and powerful use cases for different scenarios. This is probably going to happen when Wave gets closer to the public beta.

2. Asynchronous patterns

We have learned to communicate in a turn taking fashion. It is polite to let someone speak until he has finished before starting to respond. It is not polite for everyone to speak up at any time. Waves allow people to reply or edit without obeying to the turn taking pattern. This can cause “stress” and also a lot of misunderstanding. People could reply to a text, that is going to change without them noticing that. Their reply suddenly become nonsense – the playback feature could become the only way to percieve a conversation properly. But playback is new – people have learned that the threaded view is a chronology – but in Google Wave it is not (or not necessarily).

Even with the playback feature, people need to become aware of the asychronicity in Google Wave – and learn how to recap conversations correctly.

Possible approach: Find a very good way to understand the chronology of a wave (e.g. making the playback as fundamental for navigation of a wave like scrolling)

3. Information (over)flow

While Google Wave may integrate many messaging systems – it also generates a lot of density. Means of communication that were apart from each other – using different URLs and applications for each – are now combined. The crucial part of that is to understand which option is suited for what purpose.

With Waves being set to “updated” by displaying them in bold typeface and sorting it to become a top item in the inbox, this also means that things are brought to my attention that should remain buried for a good reason. Google Wave users would have to learn how to manageund understand the “Inbox” and the “Active” areas properly, to be able to get the most out of it.

Possible approach: Allow users very powerful and fine grained control over the way they are informed about updates.

4. Scattered spaces and framgmented scopes

One of the things that really can make things too complex to be comprehended properly is that people can read & write to waves – but replies can extend or narrow the scope (e.g. who may read and reply to a new item. Who is reading? Who am I replying to? Is this part really private or not? Am I releasing a secret to the public accidentally?)

With a view from a different angle: What I can see within a wave may be different to what someone else is seeing. To make my communication appropriate to the situation I need to be able to “read” from a different standpoint. It is required to understand when communication could fail on the recieving end.

Whenever I want to understand the perspective of someone else – in need to be able to represent his/her view in my mind. The change of scope for parts of a wave within that wave can make this difficult.

Possible approach: Make any changes of the scope (e.g. recipient list) within a wave very visible and allow users to navigate them.

WebKit adds 3D

The developers of the Webkit HTML rendering engine (the one that is used in the Apple Safari Browser) have added 3D styles to CSS. It allows layers to be rotated, scaled and moved in a 3D space.

You need to download a nightly build of the browser to see it working.

There are quite a number of applications for this I can think of. I wonder if this approach will be adopted by the W3C for a new CSS standard.

Project Natal – the first true innovation from Microsoft

I have been thinking about Project Natal over the weekend. I do not want to discredit some of the innovations Microsoft has created over the last two decades – but for the most part Microsoft has not been able to create innovations on its own (but rather mimicking or buying stuff from outside). There may be some advances like C# and .NET – but generally this is insider stuff – meaning nothing to a wider public.

Project Natal may be the first true innovation with an Microsoft stamp on it. Fifteen years ago I have seen programmers trying to recognize 2D movements of arms and legs from a video – with results that were respectable – but never a game changer. Too much CPU power was required back then to be relevant in the consumer market.

To include the 3rd dimension in the motion detection is such a game changer. Combined with voice and face recoginition, this takes away the controller out of the control: your full persona is represented in the system – not just your fingertip. This is radical – and it has been a dream for many many years.

Just look at this example from game designer Peter Molyneux from Lionhead:

The device is so complex that a developer will have to have access to an SDK that allows simplified communication with the sensory system of Natal. Frameworks could provide automatic recognition of gestures to programmers – even in combination (so I you wave your arm, that would call another function than waving your arm and saying “Bye!”).

The level of precision could increase with future revisions. It could be combined with classical controllers. Maybe one day even finger positions, fluctuations/timbre of the voice, body temperature or point of view will be detected as well. Simple “lite” versions specialized on facial parameters could replace webcams in laptops.

So I do not look at Natal as a game controller – I see it as a complete new interface generation coming up.

Hats off to Microsoft!

Project Natal

Obviously Microsoft feels the need to claim back some market share the Nintendo Wii took away with a new controller type. Project Natal is utilizing a range of biometric sensors for body motion, face and voice recognition.

The video is more a vision than an actual feature presentation. But it is clear what the goals are.

Here is another Video from the demonstration that shows what is possible right now:

Copenhagen UI concept

Via blogblog: Here is a user experience concept study that is a mockup of a new Windows UI – and it is not designed by Microsoft but by a guy named Cullen Dudas.

Looks good. Would love to see more. I hope Microsoft comes up with some UI innovations in Windows 7 that really serve the user.