[ann] Computer Vision Segmentation Patch for download
I’ve been working on a little computer vision patch implementing an
interesting paper I read a few weeks ago. To that end, I made 2 new
externals along the way. One which calculates the median of an image.
If given a 3d buffer, it will calculate the temporal median at each
pixel, providing a nice and simple way to acquire the background image
of a scene. The other external computes an algorithm called the "A
Contrario Method" of segmentation.
It uses a statistical distance measure to calculate the most
meaningful portions of an image based on comparing the gradient of the
background image and a frame. I’ve implemented most of the algorithm
here but haven’t done the final refinement step. Also, the code is
highly beta and is thus very unoptimized.
In the download package, I’ve provided the new externals compiled for
OSX as well as the patch and source code for the algorithm’s external.
There’s also a schematic of the patch/source code and a PDF of the
paper. If anyone wants a Windoze version, contact me and if I get
bugged enough, I’ll make one.
Here’s the link: http://www.mat.ucsb.edu/~whsmith/vision.html . The
download link is at the bottom of the page.
please consider that windows port
i was a bit quick to post.
this subeject is very close to my heart.
a lot of times i come across a papers like http://www.google.co.il/search?l&q=background+moving+seg mentation but i never could figure out what level of matematics is needed to translate most of those algorithms to machine langauge.
please if you can tell me a bit about your level of math and even better the process of tackeling such a task.
hope its not to much
I too am often frustrated by computer vision and graphics papers.
Often (especially in journal papers), they leave so much out. It may
not seem so when you read the paper, but when you go to implement it,
there are many seemingly small decisions that are actually implicit
assumptions of the paper. These can be really tricky to sort out.
Something that really annoys me about the 2 fields above is the lack
of shared code. It would make things so much better as more people
could gain access to the ideas through seeing their implementation.
For this paper in particular, I has having quite a time figuring out
the statistical measure of significance. At first, I was oing off of
the IEEE version of the paper. Fortuitously, I went to one of the
author’s webpage and he had an extended version of the paper with a
simplified and approximated significance measure. This saved my ass
and I was able to proceed.
That said, my math skills are quite good as I have a degree in
electrical engineering. I’m mostly limited by obscure notation and
lack of details in a paper and somewhat a lack of thorough background
in the field although this is changing as I read and implement more.
Basically, it requires banging your head against the screen for many
As far as the maths are concerned, you should know a bit of numberical
analysis and statistics. One thing that’s really useful to know is
how to take the gradient of an image.
PS….I didn’t mention this in my original email, but I was quite lazy
in handling the boundary conditions of the gradient function in the
xray.jit.probsegment code. The way I did it is quite wrong, but I was
just trtying to get the algorithm working in the first place and
didn’t really care about the edge pixels too much.
I just finished a spatial 3×3 median filter external for jitter as
well. I have run across this filter many times while reading CV
journals. It often runs just after thresholding to remove small salt/
pepper noise. A close operator is then often applied.
It runs quite quickly.
I also have a hough transform (for line finding) for jitter if anyone
is interested in trying it out.
If anyone is interested in testing these, please let me know. More
are on the way.
I suspected from your oprevious emails that you might’ve made a median
filter. The Hough transform filter sounds quite interesting. I
implemented a really really slow one as a ptacher using GL render to a
matrix and accumulating the rendered curves. I love how the images
look from the Hough transform. Can’t wait to see what else you’ve got
in store for us.
Have you done any work with jit.gl.slab and computer vision? A
median filter, dilate, erode, etc. were included for jit.gl.slab. A
nice start. I’ve seen optical flow, correspondence, sobel, canny,
and a host of other standard algorthms ported to the GPU using CG etc
(openvidia project for example). Some interesting implementations of
computer vision algos have also been ported to QuarzComposer, which
is an interesting piece of software.
I have only implemented rudimentary convolution type algos to pixel
shaders. This is definitely an interesting way to go although
ofttimes I want to use the resulting data for control signals which
means bringing something back into software…not a very efficient
thing right now. Plus, the data is usually floating point, so having
floating point textures more widely supported would be grat as well.
Remeber this? For those interested, I’ve updated the source to use a
look up table for calculating the arctangent used in finding the
gradient of the video. It’s a really crude linear piecewise LUT, but
it’s accurate enough and it gives about 3 fps better performance on
320×240 video. The earl is
http://www.mat.ucsb.edu/~whsmith/vision.html . The new source is
linked to at the bottom of the page. For now, only osx is recompiled,
so if you want windows, download the osx package and compile it with
cygwin or visual studio.