[ann] Computer Vision Segmentation Patch for download


    Feb 02 2006 | 6:28 am
    Hey everyone,
    I've been working on a little computer vision patch implementing an
    interesting paper I read a few weeks ago. To that end, I made 2 new
    externals along the way. One which calculates the median of an image.
    If given a 3d buffer, it will calculate the temporal median at each
    pixel, providing a nice and simple way to acquire the background image
    of a scene. The other external computes an algorithm called the "A
    Contrario Method" of segmentation.
    It uses a statistical distance measure to calculate the most
    meaningful portions of an image based on comparing the gradient of the
    background image and a frame. I've implemented most of the algorithm
    here but haven't done the final refinement step. Also, the code is
    highly beta and is thus very unoptimized.
    In the download package, I've provided the new externals compiled for
    OSX as well as the patch and source code for the algorithm's external.
    There's also a schematic of the patch/source code and a PDF of the
    paper. If anyone wants a Windoze version, contact me and if I get
    bugged enough, I'll make one.
    Here's the link: http://www.mat.ucsb.edu/~whsmith/vision.html . The
    download link is at the bottom of the page.
    cheers,
    wes

    • Feb 02 2006 | 8:24 am
      looks promising
      please consider that windows port
    • Feb 02 2006 | 6:50 pm
      i was a bit quick to post.
      this subeject is very close to my heart.
      a lot of times i come across a papers like http://www.google.co.il/search?l&q=background+moving+seg mentation but i never could figure out what level of matematics is needed to translate most of those algorithms to machine langauge.
      please if you can tell me a bit about your level of math and even better the process of tackeling such a task.
      hope its not to much
      yair
    • Feb 02 2006 | 7:08 pm
      Hi Yair,
      I too am often frustrated by computer vision and graphics papers.
      Often (especially in journal papers), they leave so much out. It may
      not seem so when you read the paper, but when you go to implement it,
      there are many seemingly small decisions that are actually implicit
      assumptions of the paper. These can be really tricky to sort out.
      Something that really annoys me about the 2 fields above is the lack
      of shared code. It would make things so much better as more people
      could gain access to the ideas through seeing their implementation.
      For this paper in particular, I has having quite a time figuring out
      the statistical measure of significance. At first, I was oing off of
      the IEEE version of the paper. Fortuitously, I went to one of the
      author's webpage and he had an extended version of the paper with a
      simplified and approximated significance measure. This saved my ass
      and I was able to proceed.
      That said, my math skills are quite good as I have a degree in
      electrical engineering. I'm mostly limited by obscure notation and
      lack of details in a paper and somewhat a lack of thorough background
      in the field although this is changing as I read and implement more.
      Basically, it requires banging your head against the screen for many
      months.
      As far as the maths are concerned, you should know a bit of numberical
      analysis and statistics. One thing that's really useful to know is
      how to take the gradient of an image.
      best,
      wes
    • Feb 03 2006 | 3:07 am
      PS....I didn't mention this in my original email, but I was quite lazy
      in handling the boundary conditions of the gradient function in the
      xray.jit.probsegment code. The way I did it is quite wrong, but I was
      just trtying to get the algorithm working in the first place and
      didn't really care about the edge pixels too much.
      wes
    • Feb 03 2006 | 6:22 pm
      Hey,
      I just finished a spatial 3x3 median filter external for jitter as
      well. I have run across this filter many times while reading CV
      journals. It often runs just after thresholding to remove small salt/
      pepper noise. A close operator is then often applied.
      It runs quite quickly.
      I also have a hough transform (for line finding) for jitter if anyone
      is interested in trying it out.
      If anyone is interested in testing these, please let me know. More
      are on the way.
      Christopher
    • Feb 03 2006 | 6:33 pm
      I suspected from your oprevious emails that you might've made a median
      filter. The Hough transform filter sounds quite interesting. I
      implemented a really really slow one as a ptacher using GL render to a
      matrix and accumulating the rendered curves. I love how the images
      look from the Hough transform. Can't wait to see what else you've got
      in store for us.
      wes
    • Feb 03 2006 | 6:46 pm
      Have you done any work with jit.gl.slab and computer vision? A
      median filter, dilate, erode, etc. were included for jit.gl.slab. A
      nice start. I've seen optical flow, correspondence, sobel, canny,
      and a host of other standard algorthms ported to the GPU using CG etc
      (openvidia project for example). Some interesting implementations of
      computer vision algos have also been ported to QuarzComposer, which
      is an interesting piece of software.
    • Feb 03 2006 | 6:51 pm
      I have only implemented rudimentary convolution type algos to pixel
      shaders. This is definitely an interesting way to go although
      ofttimes I want to use the resulting data for control signals which
      means bringing something back into software...not a very efficient
      thing right now. Plus, the data is usually floating point, so having
      floating point textures more widely supported would be grat as well.
      wes
    • Apr 14 2006 | 5:50 am
      Remeber this? For those interested, I've updated the source to use a
      look up table for calculating the arctangent used in finding the
      gradient of the video. It's a really crude linear piecewise LUT, but
      it's accurate enough and it gives about 3 fps better performance on
      320x240 video. The earl is
      http://www.mat.ucsb.edu/~whsmith/vision.html . The new source is
      linked to at the bottom of the page. For now, only osx is recompiled,
      so if you want windows, download the osx package and compile it with
      cygwin or visual studio.
      best
      wes