Push the study to the market

My student Torben has just published his Android augmented reality app SINLA in the Android market. Our aim is to not only publish a cool app but to also use the market for a user study. The application is similar to Layar and Wikitude but we believe that the small mini-map you find in existing application (the small map you see in the lower right corner in the image below) might not be the best solution to show the users objects that are currently not in the focus of the camera.

We developed a different visualization for what we call “off-screen objects” that is inspired by off-screen visualizations for digital maps and navigation in virtual reality. It based on arrows pointing towards the objects. The arrows are arranged on a circle in a 3D perspective. Check out the image below to get an impression how it looks.

Its our first try to use a mobile market to get feedback from real end users. We compare our visualization technique with the more traditional mini-map. We collect only very little information from users at the moment because we’re afraid that we might deter users from providing any feedback at all. However, I’m thrilled to see if we can draw any conclusion from the feedback we get from the applications. I assume that this is a new way to do evaluations which will become more important in the future.

Instant exit with PowerPoint Viewer 2007

Recently I tried to deploy one of my prototypes on another machine. The functionality of the prototype was quite simple: Hold a predefined object in front of your webcam and the system starts an according presentation on the screen. Not sure if this is really useful for anything but I did something similar with printed photos. Whenever I held a printed photo in front of my webcam the system tries to find the digital equivalent on my computer. If a photo is recognized the according directory is opened in the file explorer and the photo is selected.
However, when I tried to use the PowerPoint Viewer 2007 to open presentations it didn’t worked. When I tried to start the PowerPoint Viewer it instantly exited. It took me a while to find a solution for this behaviour. It is somehow related to a bug described in the Microsoft Knowledge Base. The described cause is:

“This issue occurs when you have a non-English version of Microsoft Office or of PowerPoint Viewer installed on a computer that has an English version of Microsoft Windows installed.”

(by the way is that really a cause or rather an excuse?)
I run a German Windows XP and I tested the PowerPoint Viewer on at least three other computers with a German XP. Maybe the described “cause” is just wrong because the following solution worked very well for me:

Go to the directory: C:\Programme\Microsoft Office\Office12
Copy the folder which is located in that directory and rename it to 1033

Camera image as an OpenGL texture on top of the native camera viewfinder

I played a bit with the camera viewfinder on my G1 which is usually displayed directly by the camera driver. I hoped that I could synchronize the driver’s camera frame rendering with my own processing and visualization. After an hour or so later I now assume that this is not possible at the moment. However, while playing around I extended the example below as you can see in these screenshots.

OpenGL Camera ScreenshotOpenGL Camera Screenshot

An OpenGL cube textured with the camera frame is rendered on top of the standard camera viewfinder. Thus, the standard camera image in the background is colored while the the cube is only grayscale. I worry that I have to make the OpenGL texture colored as well soon. I also cleaned up the source code a bit by extending GLSurfaceView instead of doing most of the OpenGL stuff myself and using a SurfaceView. I uploaded an updated version to the android market (direct link to the android market). You find the sourceode here.

Showing camera images with OpenGL on Android example

I fiddeled a small example together that shows how to get images from the camera and render them with OpenGL. The example is for Android phones and consists of three classes:

  • GLCamTest is the application’s main Activity. It does nothing special apart from putting the app in fullscreen mode and creating a GLLayer object as well as a Preview object.
  • The Preview class handles the camera. In particular, the method setPreviewCallback is used to receive camera images. The camera images are not processed in this class but delivered directly to the GLLayer. This class itself does not display the camera images.
  • GLLayer uses OpenGL ES to render the camera’s viewfinder image on the screen. Unfortunately I don’t know much about OpenGL (ES). The code is mostly copied from some examples. The only interesting stuff happens in the main loop (the run method) and the onPreviewFrame method.

Furthermore, we have the BooleanLock class which is completely boring. I uploaded the eclipse project containing the source code. I have only tested it on the emulator and with my tuned G1 not sure if it works with normal devices.

I just tested it on a normal G1. Performance is horrible; the Garbage Collector jumps in a few times per second and stops the video. It’s because of the Camera Preview Callback memory Issue. Unfortunately I assume that this can’t be changed without touching the firmware. I also uploaded the example to the Android Market.

Processing camera frames on Android

Recently I wanted to process and display camera frames using my Android G1. I’ve done similar things using Python on S60 and Windows Mobile 6 and expected it to be quite easy on the G1 as well. As first step I extended a SurfaceView that uses the camera and calls setPreviewCallback to register a onPreviewFrame callback and receive images from the camera as described in several tutorials. The camera frames are then displayed via my SurfaceView and I receive the according data as well.

However, I wanted to keep the processing of the frames and displaying the frames in sync. With the simple approach this is not possible because the onPreviewFrame is not synchronized with displaying the frames. My alternative was to not display the frames with the SurfaceView directly but convert the received image data to an OpenGL texture and render the camera viewfinder with OpenGL ES. This works surprisingly fast on my G1. In the video below I render the camera frames on an OpenGL rectangle to get some fancy effects.

My viewfinder is grayscale because I only copy the luminance part of the camera frames (which is encoded in a YUV colour space) to the OpenGL texture. Decomposing also the U and V part is probably a bit slower. Copying a 160×240 YUV frame to a 256×256 luminance array (which is used to create the texture) is very simple and looks as follows.

public static void yuvToLum160x240(byte[] yuv, byte[] lum) {
int lumCount=0;
int yuvCount=0;
for (int y=0;y<160;y++) {
System.arraycopy(yuv, yuvCount, lum, lumCount, 240);
yuvCount=yuvCount+240;
lumCount=lumCount+256;
}
}

Markerless Object Recognition on a Mobile Phone

I implemented a markerless object recognition that processes multiple camera images per second on recent mobile phones. The algorithm combines a stripped down SIFT with a scalable vocabulary tree and a simple feature matching.
Based on this algorithm we implemented a simple application which is shown in the video below. The stuff is described in more detail in a paper titled “What is That? Object Recognition from Natural Features on a Mobile Phone” that we submitted to MIRW’09.

The beauty of ARM assembler

After realizing that Visual Studio’s support for XScale intrinsics is somewhat buggy I took a look at ARM assembler. The needed SSD function is quite simple so it was quite easy to implement it (actually it took me quite some time to find the assembler on my disk). Since my data is only aligned to 32 bit I had to stick loading 4 bytes at a time. It looks likes this

squared_distance_asm proc
wldrw wR0, [r0] ; load 4 bytes in wR0
wzero wR10 ; rW10 == 0
wldrw wR1, [r1] ; load 4 bytes in wR1
wunpckilb wR2, wR0, wR10
wunpckilb wR3, wR1, wR10
wsubhss wR2, wR2, wR3
wldrw wR0, [r0, #4] ; load 4 bytes in wR0
wmacsz wR13, wR2, wR2
wldrw wR1, [r1, #4] ; load 4 bytes in wR1
wunpckilb wR2, wR0, wR10
wunpckilb wR3, wR1, wR10
wsubhss wR2, wR2, wR3
; repeat the above as often as necessary

; return the result
tmrrc r0, r1, wR13
end mov pc,lr ; return to C with the return value in R0

Loads and calculation are interleaved to have less pipeline stalls. I haven’t looked at it in detail but the assembler version need ~25% less time than the intrinsics which needs around 25% less time than the naive C version. Still the assembler and the intrinsic versions are slower than I expected. Probably they are not properly inlined.

WMMX is buggy in Visual Studio 2008

I implemented an object recognition algorithm for Windows Mobile 6 using Visual Studio 2008. When it worked somehow I thought about improving its performance to process more images per second. One aspect of my implementation is to compute two byte vector’s sum of squared differences (some million times per second of course). My device is an ASUS P535 with an Xscale processor so I opt for using Wireless MMX. Since in-line assembler is not supported for ARM processors I used the according MMX intrinsics.

My inital attempt to compute the SSD of two 8 byte vectors looked as follows:


//Computes the sum of squared difference for eight values
int squared_distance(unsigned char *a, unsigned char *b) {
__m64 v1=*((__m64*)(a));
__m64 v2=*((__m64*)(b));
__m64 v3=_mm_subs_pi16(_mm_unpacklo_pi8(v2, zero), _mm_unpacklo_pi8(v1, zero));
result=_mm_mac_pi16(result,v3, v3);
__m64 v4=_mm_subs_pi16(_mm_unpackhi_pi8(v2, zero), _mm_unpackhi_pi8(v1, zero));
result=_mm_mac_pi16(result,v4, v4);
return result.m64_i32[0];
}

Of course the function must be adapted to fit the actual length of the vector to be useful. However, the function returned completely random results. It took me a while to puzzle out why my function is buggy. Actually the values loaded in v1 and v2 are already wrong. __m64 v1=*((__m64*)(a)); should load 8 bytes in v1 but loads only 4 bytes in the lower half of v1. The other 4 bytes seem to be random. I tested a bunch of other options to load values in a __m64 variable and all failed in the same way. Looking into the assembler code generated by the compiler reveals that instead of using the wldrd instruction (which actually loads 8 bytes) the compiler generates a wldrw instruction (which loads only 4 bytes). It might be a compiler bug and I assume that its related to the alignment of the arrays. Intel’s assembler reference manual says that in order to load 8 bytes in a WMMX the bytes must be aligned to 8 bytes. However, Microsoft’s documentation of the WMMX intrinsics tells us that if “data is not appropriately aligned, the program will throw an exception“. No exception for me and I also tried to align the data properly.