[ann] Computer Vision Segmentation Patch for download

    Feb 02 2006 | 6:28 am
    Hey everyone, I've been working on a little computer vision patch implementing an interesting paper I read a few weeks ago. To that end, I made 2 new externals along the way. One which calculates the median of an image. If given a 3d buffer, it will calculate the temporal median at each pixel, providing a nice and simple way to acquire the background image of a scene. The other external computes an algorithm called the "A Contrario Method" of segmentation.
    It uses a statistical distance measure to calculate the most meaningful portions of an image based on comparing the gradient of the background image and a frame. I've implemented most of the algorithm here but haven't done the final refinement step. Also, the code is highly beta and is thus very unoptimized.
    In the download package, I've provided the new externals compiled for OSX as well as the patch and source code for the algorithm's external. There's also a schematic of the patch/source code and a PDF of the paper. If anyone wants a Windoze version, contact me and if I get bugged enough, I'll make one.
    Here's the link: http://www.mat.ucsb.edu/~whsmith/vision.html . The download link is at the bottom of the page.
    cheers, wes

    • Feb 02 2006 | 8:24 am
      looks promising please consider that windows port
    • Feb 02 2006 | 6:50 pm
      i was a bit quick to post. this subeject is very close to my heart. a lot of times i come across a papers like http://www.google.co.il/search?l&q=background+moving+seg mentation but i never could figure out what level of matematics is needed to translate most of those algorithms to machine langauge. please if you can tell me a bit about your level of math and even better the process of tackeling such a task. hope its not to much yair
    • Feb 02 2006 | 7:08 pm
      Hi Yair, I too am often frustrated by computer vision and graphics papers. Often (especially in journal papers), they leave so much out. It may not seem so when you read the paper, but when you go to implement it, there are many seemingly small decisions that are actually implicit assumptions of the paper. These can be really tricky to sort out. Something that really annoys me about the 2 fields above is the lack of shared code. It would make things so much better as more people could gain access to the ideas through seeing their implementation.
      For this paper in particular, I has having quite a time figuring out the statistical measure of significance. At first, I was oing off of the IEEE version of the paper. Fortuitously, I went to one of the author's webpage and he had an extended version of the paper with a simplified and approximated significance measure. This saved my ass and I was able to proceed.
      That said, my math skills are quite good as I have a degree in electrical engineering. I'm mostly limited by obscure notation and lack of details in a paper and somewhat a lack of thorough background in the field although this is changing as I read and implement more. Basically, it requires banging your head against the screen for many months.
      As far as the maths are concerned, you should know a bit of numberical analysis and statistics. One thing that's really useful to know is how to take the gradient of an image.
      best, wes
    • Feb 03 2006 | 3:07 am
      PS....I didn't mention this in my original email, but I was quite lazy in handling the boundary conditions of the gradient function in the xray.jit.probsegment code. The way I did it is quite wrong, but I was just trtying to get the algorithm working in the first place and didn't really care about the edge pixels too much.
    • Feb 03 2006 | 6:22 pm
      I just finished a spatial 3x3 median filter external for jitter as well. I have run across this filter many times while reading CV journals. It often runs just after thresholding to remove small salt/ pepper noise. A close operator is then often applied.
      It runs quite quickly.
      I also have a hough transform (for line finding) for jitter if anyone is interested in trying it out.
      If anyone is interested in testing these, please let me know. More are on the way.
    • Feb 03 2006 | 6:33 pm
      I suspected from your oprevious emails that you might've made a median filter. The Hough transform filter sounds quite interesting. I implemented a really really slow one as a ptacher using GL render to a matrix and accumulating the rendered curves. I love how the images look from the Hough transform. Can't wait to see what else you've got in store for us.
    • Feb 03 2006 | 6:46 pm
      Have you done any work with jit.gl.slab and computer vision? A median filter, dilate, erode, etc. were included for jit.gl.slab. A nice start. I've seen optical flow, correspondence, sobel, canny, and a host of other standard algorthms ported to the GPU using CG etc (openvidia project for example). Some interesting implementations of computer vision algos have also been ported to QuarzComposer, which is an interesting piece of software.
    • Feb 03 2006 | 6:51 pm
      I have only implemented rudimentary convolution type algos to pixel shaders. This is definitely an interesting way to go although ofttimes I want to use the resulting data for control signals which means bringing something back into software...not a very efficient thing right now. Plus, the data is usually floating point, so having floating point textures more widely supported would be grat as well.
    • Apr 14 2006 | 5:50 am
      Remeber this? For those interested, I've updated the source to use a look up table for calculating the arctangent used in finding the gradient of the video. It's a really crude linear piecewise LUT, but it's accurate enough and it gives about 3 fps better performance on 320x240 video. The earl is http://www.mat.ucsb.edu/~whsmith/vision.html . The new source is linked to at the bottom of the page. For now, only osx is recompiled, so if you want windows, download the osx package and compile it with cygwin or visual studio.
      best wes