Zafar Ahmed Ansari
Indian Institute of Technology
Jodhpur,
Rajasthan,
India.
Work on Microsoft XBox Kinect

Obstacle Avoidance

Kinect provides us a way to map the environment in front of it by giving us a depth image. There are at present two major open source drivers for the Kinect- libfreenect and OpenNi. OpenNi especially provides us a well documented programming interface. Using Libfreenect I started learning about the Kinect hardware. Following are some things I noticed:
1. The possible depth ranges can be from 0.5 m to 3 m.
2. Kinect faces trouble dealing with irregular surfaces like carpets. In such cases the depth image will have significant numbers of black or unreachable points.
3. The depth sensor returns values between 0 and 2047. 2047 could refer to either the most distant point or a point that is either unreachable due to irregular reflection or is at a distance below 0.5 m.
4. The RGB and infrared images are not of the exact same scene. This is because they are separated by 7.5 cm and their focal lengths are different too. We need to calibrate our cameras so that the two output images can be correctly correlated. See http://www.ros.org/wiki/kinect_calibration/technical for more.

Calculating spatial distance from depth data
I could find two methods to convert raw depth integers (0 to 2047) to actual distance in metres. Stephane Magnenat and Nicolas Burrus have given formulae to calculate distance. I follow here Burrus' formula:
float raw_depth_to_meters(int raw_depth)
{
if (raw_depth < 2047)
{
return 1.0 / (raw_depth * -0.0030711016 + 3.3309495161);
}
return 0;
}
After calculating the distance from depth values I wanted to find distance between two points whose depth values are known. Burrus gives the following method:
P3D.x = (x_d - cx_d) * depth(x_d,y_d) / fx_d
P3D.y = (y_d - cy_d) * depth(x_d,y_d) / fy_d
P3D.z = depth(x_d,y_d)
with fx_d, fy_d, cx_d and cy_d the intrinsics of the depth camera.
We can then reproject each 3D point on the color image and get its color:
P3D' = R.P3D + T
P2D_rgb.x = (P3D'.x * fx_rgb / P3D'.z) + cx_rgb
P2D_rgb.y = (P3D'.y * fy_rgb / P3D'.z) + cy_rgb
with R and T the rotation and translation parameters estimated during the stereo calibration.
The parameters I could estimate for my Kinect are:
Color
fx_rgb 5.2921508098293293e+02
fy_rgb 5.2556393630057437e+02
cx_rgb 3.2894272028759258e+02
cy_rgb 2.6748068171871557e+02
k1_rgb 2.6451622333009589e-01
k2_rgb -8.3990749424620825e-01
p1_rgb -1.9922302173693159e-03
p2_rgb 1.4371995932897616e-03
k3_rgb 9.1192465078713847e-01
Depth
fx_d 5.9421434211923247e+02
fy_d 5.9104053696870778e+02
cx_d 3.3930780975300314e+02
cy_d 2.4273913761751615e+02
k1_d -2.6386489753128833e-01
k2_d 9.9966832163729757e-01
p1_d -7.6275862143610667e-04
p2_d 5.0350940090814270e-03
k3_d -1.3053628089976321e+00
Relative transform between the sensors (in meters)
R
[ 9.9984628826577793e-01, 1.2635359098409581e-03,
-1.7487233004436643e-02, -1.4779096108364480e-03,
9.9992385683542895e-01, -1.2251380107679535e-02,
1.7470421412464927e-02, 1.2275341476520762e-02,
9.9977202419716948e-01 ]
T
[ 1.9985242312092553e-02, -7.4423738761617583e-04,
-1.0916736334336222e-02 ]

Developing the algorithm
The first task was to be able to acquire Kinect's depth data. Libfreenect did that quite well returning an instance of IplImage each for the depth and the RGB cameras. The images had to be downsampled from 640X480 to 320X240 because we needed to speed up the calculation as we were running the P3DX (the one with Kinect) through remote terminal.
Version 0.1
I first assumed a fixed rectangular area below the horizon in the depth image to be the indicator of the direction in which the path was clear. It is clear that we had to bother only about the data below the horizon. The first version just calculated the average of all pixels in the rectangular area chosen. One iteration of the loop calculated this average depth for all such rectangular areas possible in a given image and then select that area for which the average was the highest. This had many deficiencies, it assumed a fixed sized area which is not right as the farther we look the smaller a length seems to become in the image. The second problem was that the predicted area was fluctuating greatly. The third problem was that at close ranges the predicted rectangular area seemed to halt at the wrong locations.
Version 0.2
The problem of incorrect prediction at close ranges was found to be caused by the 2047 value that the invalid pixels automatically acquired as their depth because they were either closer than 0.5 metres or unreachable. So I gave them the value of 0 at each frame and that somewhat improved the prediction.
Version 0.3
The rectangular area was discarded in favor of a trapezoidal directional area that varied in width according to the direction it pointed to. Also the fluctuation was removed by averaging the prediction over a certain number of frames. The video generated was saved and then streamed to the other machine using VLC (but not entirely satisfactorily). The path thus predicted was then used to calculate the number of degrees the robot must turn to be able to face the free path.
The specifications of the Mobile robot platform P3DX on which Kinect was mounted are:
Width:39.3 cm
Height: 23.7 cm
Clearance: 6 cm
A parallel path on the floor was like a trapezoid on the image. I had to calculate through manual inspection what exact width on the image in terms of pixels corresponded to a width of 39.3 cm.
Version 0.4
I am trying now to be able to indicate on the trapezoidal path the exact place after which the path is blocked. In a situation where there is no perfectly free path available the algorithm tries to find the longest free path. This need not be too difficult.