I'm experimenting with some pattern matching/clustering/sorting stuff, and for some of it I've been using Euclidean distance. I'm wondering what the best approach is when dealing with different dimensionalities. That is, when 2 arrays are different lengths, is it better to pad the shorter one (presumably with 0.0s), or should I find some way of truncating the longer one? Truncating seems like it would be somewhat arbitrary in removing information from the longer array, but padding the shorter one is also kind of arbitrary, in a way... If I think of it just in terms of 2D and 3D spaces, then padding seems reasonable, as it would be like imagining the 2D point to be at 0.0 on the z-axis of a 3D space, which is certainly arbitrary, but not particularly disagreeable.

Any thoughts? Or is there some better overall method I should be considering, in cases where the arrays to be compared are of different lengths?

euclidean-distance-with-different-dimensionalities

Actually, if anyone has anything more to add, I'd still appreciate any thoughts.

My question before was really whether it was a better approach to "pad" the lower-dimensional array, or to truncate the higher-dimensional array. I understand that the value of any padding would be arbitrary, however, what's not clear still is whether truncating, or some form of dimension reduction would be a better approach. The main reason I ask is because, in experimenting with the results, they are really quite different. Thinking about it now, I actually kind of feel inclined to truncate dimensions on the higher-dimensional array, since, if I again use 2D and 3D spaces as an example, it seems to makes more sense to reduce a 3D point to its 2D projection than to give an arbitrary z position for a 2D point. Yes? No?

(I'd imagine that, for the geometrically-inclined, this is a bit like understanding why the first black key above C is sometimes a C# and sometimes a Db...)

thanks in advance for any further thoughts,

Actually, if anyone has anything more to add, I'd still appreciate any thoughts. 

My question before was really whether it was a better approach to "pad" the lower-dimensional array, or to truncate the higher-dimensional array. I understand that the value of any padding would be arbitrary, however, what's not clear still is whether truncating, or some form of dimension reduction would be a better approach. The main reason I ask is because, in experimenting with the results, they are really quite different. Thinking about it now, I actually kind of feel inclined to truncate dimensions on the higher-dimensional array, since, if I again use 2D and 3D spaces as an example, it seems to makes more sense to reduce a 3D point to its 2D projection than to give an arbitrary z position for a 2D point. Yes? No?
(I'd imagine that, for the geometrically-inclined, this is a bit like understanding why the first black key above C is sometimes a C# and sometimes a Db...)

It really depends on how your dimensions are organized. That is, is 

this a case of x,y or more a case of dimension 1 = x, dimension 2 = y? 

Are they on the same scale? (and what are you using distance for?)

There's many different ways of calculating distance. The problem with 

Euclidean distance is that it doesn't take scale into account. If x 

has a min and max range of 0, 10 and y has a min and max range of 1,2 

moving all the way from the maximum to the minimum will be a 

significantly greater distance for x than if you were to do the same in 

Mahalanobis distance does take this into account, but will take more 

math chops, as you'll need to calculate a covariance matrix. Here's 

http://en.wikipedia.org/wiki/Mahalanobis_distance

Also, if you don't want to code it in Max (which I think would be a 

very good idea) you could use mxj java code to do it. Here's a link to 

a page with some Java statistical objects that will calculate 

covariance matrices as well as Mahalanobis distances.

I would not recommend padding your values because it's only going to 

skew your data. If you have full data for only two dimensions, then 

From my limited experience with Music Information Retrieval, in my 

project I took an array with 178 variables per entry and used Principal 

Component Analysis to figure out which were the most important elements 

in terms of usefulness as classifiers, and then calculated Mahalanobis 

distance using the 8 most important dimensions. The catch with 8 

dimensions being that there's no good way to visualize it all at once, 

but it definitely worked well. This was in MATLAB, though.

If you'd like some more background information on Music Information 

Retrieval, the course notes for the class I took taught by Juan Bello 

http://homepages.nyu.edu/~jb2843/Teaching.html

Actually, if any coders would be interested in writing a Matlab to Max 

It really depends on how your dimensions are organized.  That is, is 
this a case of x,y or more a case of dimension 1 = x, dimension 2 = y?  
Are they on the same scale?  (and what are you using distance for?)

There's many different ways of calculating distance.  The problem with 
Euclidean distance is that it doesn't take scale into account.  If x 
has a min and max range of 0, 10 and y has a min and max range of 1,2 
moving all the way from the maximum to the minimum will be a 
significantly greater distance for x than if you were to do the same in 
the y dimension.

Mahalanobis distance does take this into account, but will take more 
math chops, as you'll need to calculate a covariance matrix.  Here's 
some links with info:

http://en.wikipedia.org/wiki/Mahalanobis_distance
http://en.wikipedia.org/wiki/Covariance

Also, if you don't want to code it in Max (which I think would be a 
very good idea) you could use mxj java code to do it.  Here's a link to 
a page with some Java statistical objects that will calculate 
covariance matrices as well as Mahalanobis distances.

I would not recommend padding your values because it's only going to 
skew your data.  If you have full data for only two dimensions, then 
I'd use those two dimensions.

 From my limited experience with Music Information Retrieval, in my 
project I took an array with 178 variables per entry and used Principal 
Component Analysis to figure out which were the most important elements 
in terms of usefulness as classifiers, and then calculated Mahalanobis 
distance using the 8 most important dimensions.  The catch with 8 
dimensions being that there's no good way to visualize it all at once, 
but it definitely worked well.  This was in MATLAB, though.

If you'd like some more background information on Music Information 
Retrieval, the course notes for the class I took taught by Juan Bello 
are here:
http://homepages.nyu.edu/~jb2843/Teaching.html

Actually, if any coders would be interested in writing a Matlab to Max 
object, that'd be pretty cool...

Quote: peter.mcculloch@gmail.com wrote on Fri, 29 June 2007 17:46

----------------------------------------------------

> It really depends on how your dimensions are organized. That is, is 

> this a case of x,y or more a case of dimension 1 = x, dimension 2 = y? 

> Are they on the same scale? (and what are you using distance for?)

> There's many different ways of calculating distance. The problem with 

> Euclidean distance is that it doesn't take scale into account. If x 

> has a min and max range of 0, 10 and y has a min and max range of 1,2 

> moving all the way from the maximum to the minimum will be a 

> significantly greater distance for x than if you were to do the same in 

> Mahalanobis distance does take this into account, but will take more 

> math chops, as you'll need to calculate a covariance matrix. Here's 

> Also, if you don't want to code it in Max (which I think would be a 

> very good idea) you could use mxj java code to do it. Here's a link to 

> a page with some Java statistical objects that will calculate 

> covariance matrices as well as Mahalanobis distances.

> I would not recommend padding your values because it's only going to 

> skew your data. If you have full data for only two dimensions, then 

> From my limited experience with Music Information Retrieval, in my 

> project I took an array with 178 variables per entry and used Principal 

> Component Analysis to figure out which were the most important elements 

> in terms of usefulness as classifiers, and then calculated Mahalanobis 

> distance using the 8 most important dimensions. The catch with 8 

> dimensions being that there's no good way to visualize it all at once, 

> but it definitely worked well. This was in MATLAB, though.

> If you'd like some more background information on Music Information 

> Retrieval, the course notes for the class I took taught by Juan Bello 

> Actually, if any coders would be interested in writing a Matlab to Max 

I've been working for a while now on a recombinance-based composition system, sort of quasi-Cope, but geared more toward realtime, interactive composition. I'm finding that my model has a strong tendency to replicate the source works in the database, rather than generate variations, so I'm re-thinking some of the basic stuff for selecting and combining the linear materials - motives, themes, and so on. I want a system that can do an okay job on its own (actually, if you know Cope's SPEAC stuff, I want it to settle into a sort of eternal "E", Extension, if left to its own devices), but which really benefits from being "steered" through the musical form by the user.

So, what I imagined doing in my latest design, was to create a somewhat smooth space containing all the linear material (I've parsed everything in the database into melodic "chunks" called VoiceSegments) in which I can move by step or leap "away from" the original setting, and maintain a somewhat predictable degree of continuity (or discontinuity) with the original.

The idea is that if VoiceSegment 100 is the original for a given setting, then I could use 99 or 101 and get a closely-related alternative VoiceSegment, whereas 50 would only show a distant connection to the original, if any at all (and 150 would also be distant, though in a different way). 

So, basically I'm trying to sort my VoiceSegments according to similarity. I have a sinking feeling that a SOM is going to be the best way to do this, but I'm really fuzzy on how to build a SOM in java (basically *all* of this is in 2 mxj objects, with a good number of classes loaded by each), and as I understand it, while SOMs are good at revealing similarities, they tend to have rather abrupt boundaries in the way they group input, and thus won't necessarily offer the smoothest transitions through *all* the material. If I'm wrong on this, and SOMs sound like the best approach to you, let me know! ;-)

Anyway, I've narrowed my attributes down to 9, the first two of which are a pitch list and an ED list (delta times). I'm trying to use Euclidean distance to find the "proximity" of two pitch lists, or ED lists. In combination with the other 7 attributes, I'm hoping this will give me enough info to do a reasonable sort of all my melodic material. That's a really broad-stroke description, but you probably get the idea...

Quote: peter.mcculloch@gmail.com wrote on Fri, 29 June 2007 17:46
----------------------------------------------------
> It really depends on how your dimensions are organized.  That is, is 
> this a case of x,y or more a case of dimension 1 = x, dimension 2 = y?  
> Are they on the same scale?  (and what are you using distance for?)
>
> There's many different ways of calculating distance.  The problem with 
> Euclidean distance is that it doesn't take scale into account.  If x 
> has a min and max range of 0, 10 and y has a min and max range of 1,2 
> moving all the way from the maximum to the minimum will be a 
> significantly greater distance for x than if you were to do the same in 
> the y dimension.
>
> Mahalanobis distance does take this into account, but will take more 
> math chops, as you'll need to calculate a covariance matrix.  Here's 
> some links with info:
>
> http://en.wikipedia.org/wiki/Mahalanobis_distance
> http://en.wikipedia.org/wiki/Covariance
>
> Also, if you don't want to code it in Max (which I think would be a 
> very good idea) you could use mxj java code to do it.  Here's a link to 
> a page with some Java statistical objects that will calculate 
> covariance matrices as well as Mahalanobis distances.
>
> http://www.mhsatman.com/downloads.htm
>
> I would not recommend padding your values because it's only going to 
> skew your data.  If you have full data for only two dimensions, then 
> I'd use those two dimensions.
>
>  From my limited experience with Music Information Retrieval, in my 
> project I took an array with 178 variables per entry and used Principal 
> Component Analysis to figure out which were the most important elements 
> in terms of usefulness as classifiers, and then calculated Mahalanobis 
> distance using the 8 most important dimensions.  The catch with 8 
> dimensions being that there's no good way to visualize it all at once, 
> but it definitely worked well.  This was in MATLAB, though.
>
> If you'd like some more background information on Music Information 
> Retrieval, the course notes for the class I took taught by Juan Bello 
> are here:
> http://homepages.nyu.edu/~jb2843/Teaching.html
>
> Actually, if any coders would be interested in writing a Matlab to Max 
> object, that'd be pretty cool...
>
>
> Peter McCulloch
> www.petermcculloch.com
>
>
----------------------------------------------------

I've been working for a while now on a recombinance-based composition system, sort of quasi-Cope, but geared more toward realtime, interactive composition. I'm finding that my model has a strong tendency to replicate the source works in the database, rather than generate variations, so I'm re-thinking some of the basic stuff for selecting and combining the linear materials - motives, themes, and so on. I want a system that can do an okay job on its own (actually, if you know Cope's SPEAC stuff, I want it to settle into a sort of eternal "E", Extension, if left to its own devices), but which really benefits from being "steered" through the musical form by the user.
So, what I imagined doing in my latest design, was to create a somewhat smooth space containing all the linear material (I've parsed everything in the database into melodic "chunks" called VoiceSegments) in which I can move by step or leap "away from" the original setting, and maintain a somewhat predictable degree of continuity (or discontinuity) with the original.
The idea is that if VoiceSegment 100 is the original for a given setting, then I could use 99 or 101 and get a closely-related alternative VoiceSegment, whereas 50 would only show a distant connection to the original, if any at all (and 150 would also be distant, though in a different way). 
So, basically I'm trying to sort my VoiceSegments according to similarity. I have a sinking feeling that a SOM is going to be the best way to do this, but I'm really fuzzy on how to build a SOM in java (basically *all* of this is in 2 mxj objects, with a good number of classes loaded by each), and as I understand it, while SOMs are good at revealing similarities, they tend to have rather abrupt boundaries in the way they group input, and thus won't necessarily offer the smoothest transitions through *all* the material. If I'm wrong on this, and SOMs sound like the best approach to you, let me know! ;-)

There's been some work that might be helpful for you. It's MATLAB 

code, but might give some ideas about analysis.

Since you are working in Java, and you are working with musical data, I 

can't highly enough recommend purchasing JMSL if you haven't already. 

These phrases could be very easily stored in musicShapes and played 

very easily. I'm working on this type of stuff in JMSL using a MySql 

database, so let me know if you purchase it and I can send you some 

Similarity cuts across a lot of dimensions; things can be similar in 

terms of rhythm, pitch, contour, density, register, dynamic, 

articulation, etc. By having different types of similarity, you should 

get significantly more interesting output from the system. For 

instance, find a phrase that is similar in terms of rhythm, pitch, and 

contour, but not register. PCA will help you pick the most unique 

I would consider looking at statistical properties in addition to your 

sequential approach; these will be particularly effective in finding 

patterns that have similar content but dissimilar ordering. (e.g. an 

arpeggio up vs an arpeggio down) Chroma vectors could be very 

effective, as they're octave-equivalent and easily 

For instance, a chromatic scale of quarter notes from C to E followed 

by a half note B and then a dotted half-note on C would yield a chroma 

Some statistical properties that might be interesting to look at:

mean, variance, (standard deviation around mean, standard deviation 

Also, properties such as (number of unique contour values / number of 

notes) can be interesting. A repeated arpeggio 60 71 63 60 71 63 will 

have a ratio of 0.5 ( count(1 2 3) --> 3 / 6 ) whereas 60 64 63 67 66 

65 will have a ratio of 1 (count(1 3 2 6 5 4) --> 6 / 6). More 

repetitions will drive the ratio even lower.

The other great advantage of statistical time-invariant properties is 

that they make your search stage significantly faster, since you're 

just comparing single pre-derived numbers.

There's been some work that might be helpful for you.  It's MATLAB 
code, but might give some ideas about analysis.
http://www.jyu.fi/musica/miditoolbox/

Since you are working in Java, and you are working with musical data, I 
can't highly enough recommend purchasing JMSL if you haven't already.  
These phrases could be very easily stored in musicShapes and played 
very easily.  I'm working on this type of stuff in JMSL using a MySql 
database, so let me know if you purchase it and I can send you some 
analysis code for musicShapes.

Similarity cuts across a lot of dimensions; things can be similar in 
terms of rhythm, pitch, contour, density, register, dynamic, 
articulation, etc.  By having different types of similarity, you should 
get significantly more interesting output from the system.  For 
instance, find a phrase that is similar in terms of rhythm, pitch, and 
contour, but not register.  PCA will help you pick the most unique 
parameters.

  I would consider looking at statistical properties in addition to your 
sequential approach; these will be particularly effective in finding 
patterns that have similar content but dissimilar ordering.  (e.g. an 
arpeggio up vs an arpeggio down)  Chroma vectors could be very 
effective, as they're octave-equivalent and easily 
transposable/invertable.

For instance, a chromatic scale of quarter notes from C to E followed 
by a half note B and then a dotted half-note on C would yield a chroma 
vector of
4 1 1 1 1 0 0 0 0 0 0 2

mean, variance, (standard deviation around mean, standard deviation 
around median, kurtosis, skew) for:
	pitch
	onset times
	release times
	duration
	velocity

Also, properties such as (number of unique contour values / number of 
notes) can be interesting.  A repeated arpeggio 60 71 63 60 71 63 will 
have a ratio of 0.5 ( count(1 2 3) --> 3 / 6 ) whereas 60 64 63 67 66 
65 will have a ratio of 1 (count(1 3 2 6 5 4) --> 6 / 6).  More 
repetitions will drive the ratio even lower.

The other great advantage of statistical time-invariant properties is 
that they make your search stage significantly faster, since you're 
just comparing single pre-derived numbers.

http://www.cnmat.berkeley.edu/MAX/downloads/files/OSX-CFM/

On Jun 29, 2007, at 12:55 PM, Peter McCulloch wrote:

> There's been some work that might be helpful for you. It's MATLAB 

> code, but might give some ideas about analysis.

> Since you are working in Java, and you are working with musical 

> data, I can't highly enough recommend purchasing JMSL if you 

> haven't already. These phrases could be very easily stored in 

> musicShapes and played very easily. I'm working on this type of 

> stuff in JMSL using a MySql database, so let me know if you 

> purchase it and I can send you some analysis code for musicShapes.

> Similarity cuts across a lot of dimensions; things can be similar 

> in terms of rhythm, pitch, contour, density, register, dynamic, 

> articulation, etc. By having different types of similarity, you 

> should get significantly more interesting output from the system. 

> For instance, find a phrase that is similar in terms of rhythm, 

> pitch, and contour, but not register. PCA will help you pick the 

> I would consider looking at statistical properties in addition to 

> your sequential approach; these will be particularly effective in 

> finding patterns that have similar content but dissimilar 

> ordering. (e.g. an arpeggio up vs an arpeggio down) Chroma 

> vectors could be very effective, as they're octave-equivalent and 

> For instance, a chromatic scale of quarter notes from C to E 

> followed by a half note B and then a dotted half-note on C would 

> Some statistical properties that might be interesting to look at:

> mean, variance, (standard deviation around mean, standard deviation 

> Also, properties such as (number of unique contour values / number 

> of notes) can be interesting. A repeated arpeggio 60 71 63 60 71 

> 63 will have a ratio of 0.5 ( count(1 2 3) --> 3 / 6 ) whereas 60 

> 64 63 67 66 65 will have a ratio of 1 (count(1 3 2 6 5 4) --> 6 / 

> 6). More repetitions will drive the ratio even lower.

> The other great advantage of statistical time-invariant properties 

And I know not if, save in this, such gift be allowed to man,

That out of three sounds he frame, not a fourth sound, but a star.

http://www.cnmat.berkeley.edu/MAX/downloads/files/OSX-CFM/ 
matlabcommunicate_1.1.2.sit

> Hi J.,
>
> There's been some work that might be helpful for you.  It's MATLAB  
> code, but might give some ideas about analysis.
> http://www.jyu.fi/musica/miditoolbox/
>
> Since you are working in Java, and you are working with musical  
> data, I can't highly enough recommend purchasing JMSL if you  
> haven't already.  These phrases could be very easily stored in  
> musicShapes and played very easily.  I'm working on this type of  
> stuff in JMSL using a MySql database, so let me know if you  
> purchase it and I can send you some analysis code for musicShapes.
>
> Similarity cuts across a lot of dimensions; things can be similar  
> in terms of rhythm, pitch, contour, density, register, dynamic,  
> articulation, etc.  By having different types of similarity, you  
> should get significantly more interesting output from the system.   
> For instance, find a phrase that is similar in terms of rhythm,  
> pitch, and contour, but not register.  PCA will help you pick the  
> most unique parameters.
>
>  I would consider looking at statistical properties in addition to  
> your sequential approach; these will be particularly effective in  
> finding patterns that have similar content but dissimilar  
> ordering.  (e.g. an arpeggio up vs an arpeggio down)  Chroma  
> vectors could be very effective, as they're octave-equivalent and  
> easily transposable/invertable.
>
> For instance, a chromatic scale of quarter notes from C to E  
> followed by a half note B and then a dotted half-note on C would  
> yield a chroma vector of
> 4 1 1 1 1 0 0 0 0 0 0 2
>
> Some statistical properties that might be interesting to look at:
>
> mean, variance, (standard deviation around mean, standard deviation  
> around median, kurtosis, skew) for:
> 	pitch
> 	onset times
> 	release times
> 	duration
> 	velocity
>
> Also, properties such as (number of unique contour values / number  
> of notes) can be interesting.  A repeated arpeggio 60 71 63 60 71  
> 63 will have a ratio of 0.5 ( count(1 2 3) --> 3 / 6 ) whereas 60  
> 64 63 67 66 65 will have a ratio of 1 (count(1 3 2 6 5 4) --> 6 /  
> 6).  More repetitions will drive the ratio even lower.
>
> The other great advantage of statistical time-invariant properties  
> is that they make your search stage significantly faster, since  
> you're just comparing single pre-derived numbers.
>
>
> Peter McCulloch
>
> www.petermcculloch.com
>

--
barry threw
Media Art and Technology
http://www.barrythrew.com
me(at)barrythrew(dot)com
857-544-3967

And I know not if, save in this, such gift be allowed to man,
That out of three sounds he frame, not a fourth sound, but a star.
-Robert Browning

Quote: peter.mcculloch@gmail.com wrote on Fri, 29 June 2007 20:55

> Since you are working in Java, and you are working with musical data, I 

> can't highly enough recommend purchasing JMSL if you haven't already. 

> These phrases could be very easily stored in musicShapes and played 

> very easily. I'm working on this type of stuff in JMSL using a MySql 

> database, so let me know if you purchase it and I can send you some 

> Similarity cuts across a lot of dimensions; things can be similar in 

> terms of rhythm, pitch, contour, density, register, dynamic, 

> articulation, etc. By having different types of similarity, you should 

> get significantly more interesting output from the system. For 

> instance, find a phrase that is similar in terms of rhythm, pitch, and 

> contour, but not register. PCA will help you pick the most unique 

> I would consider looking at statistical properties in addition to your 

> sequential approach; these will be particularly effective in finding 

> patterns that have similar content but dissimilar ordering. (e.g. an 

> arpeggio up vs an arpeggio down) Chroma vectors could be very 

> effective, as they're octave-equivalent and easily 

> For instance, a chromatic scale of quarter notes from C to E followed 

> by a half note B and then a dotted half-note on C would yield a chroma 

> Also, properties such as (number of unique contour values / number of 

> notes) can be interesting. A repeated arpeggio 60 71 63 60 71 63 will 

> have a ratio of 0.5 ( count(1 2 3) --> 3 / 6 ) whereas 60 64 63 67 66 

> 65 will have a ratio of 1 (count(1 3 2 6 5 4) --> 6 / 6). More 

> repetitions will drive the ratio even lower.

You know, I looked at JMSL a while ago, but I was only looking for notation stuff at the time, so I didn't make much use of it. As it stands, I've built all of this stuff very much along Cope's ideas, laid out in "Computer Models of Musical Creativity". So it's not too practical for me to totally switch gears into JMSL's objects. My VoiceSegment object is a java class, with quite a bit of detailed data, and references to many elements of the musical stucture of the analysed material, so I can't see myself dropping the model any time soon. 

However, I will definitely look into some of the properties you mention, as they will probably help me generate greater variety, with appropriate restrictions. Cope's approach is tied in with voice leading on its lowest levels, so my data structure also works on this basic foundation. And in this regard, my current code actually works very well. I can add a number of different works to the database, and the system will navigate convincing transitions from one source work to another. The problem is that all of the *vertical* material is always from the same source work, which means I'm basically only getting a sort of medley out of it... which is obviously not what I'm after! ;-)

I've been working with similarity measured by distance between pitch vectors, rhythm vectors (ED lists), total duration (in ticks), periodicity, kinesis, interval sum and mean interval, tessitura, and cardinality. It seems to me that, between those properties, I should be able to get a pretty decent measure of similarity. The one thing that I feel is missing is some sense of *where* the rhythmic activity is focused... I'm not sure how to express that, but perhaps your suggestions will point me in the right directions.

You've given me a lot to look into, so I'll chew on this for a couple of days, then get back to you.

Quote: peter.mcculloch@gmail.com wrote on Fri, 29 June 2007 20:55
----------------------------------------------------
> Hi J.,
>
> There's been some work that might be helpful for you.  It's MATLAB 
> code, but might give some ideas about analysis.
> http://www.jyu.fi/musica/miditoolbox/
>
> Since you are working in Java, and you are working with musical data, I 
> can't highly enough recommend purchasing JMSL if you haven't already.  
> These phrases could be very easily stored in musicShapes and played 
> very easily.  I'm working on this type of stuff in JMSL using a MySql 
> database, so let me know if you purchase it and I can send you some 
> analysis code for musicShapes.
>
> Similarity cuts across a lot of dimensions; things can be similar in 
> terms of rhythm, pitch, contour, density, register, dynamic, 
> articulation, etc.  By having different types of similarity, you should 
> get significantly more interesting output from the system.  For 
> instance, find a phrase that is similar in terms of rhythm, pitch, and 
> contour, but not register.  PCA will help you pick the most unique 
> parameters.
>
>   I would consider looking at statistical properties in addition to your 
> sequential approach; these will be particularly effective in finding 
> patterns that have similar content but dissimilar ordering.  (e.g. an 
> arpeggio up vs an arpeggio down)  Chroma vectors could be very 
> effective, as they're octave-equivalent and easily 
> transposable/invertable.
>
> For instance, a chromatic scale of quarter notes from C to E followed 
> by a half note B and then a dotted half-note on C would yield a chroma 
> vector of
> 4 1 1 1 1 0 0 0 0 0 0 2
>
> Some statistical properties that might be interesting to look at:
>
> mean, variance, (standard deviation around mean, standard deviation 
> around median, kurtosis, skew) for:
> 	pitch
> 	onset times
> 	release times
> 	duration
> 	velocity
>
> Also, properties such as (number of unique contour values / number of 
> notes) can be interesting.  A repeated arpeggio 60 71 63 60 71 63 will 
> have a ratio of 0.5 ( count(1 2 3) --> 3 / 6 ) whereas 60 64 63 67 66 
> 65 will have a ratio of 1 (count(1 3 2 6 5 4) --> 6 / 6).  More 
> repetitions will drive the ratio even lower.
>
> The other great advantage of statistical time-invariant properties is 
> that they make your search stage significantly faster, since you're 
> just comparing single pre-derived numbers.
>
>
> Peter McCulloch
>
> www.petermcculloch.com
>
>
----------------------------------------------------

You know, I looked at JMSL a while ago, but I was only looking for notation stuff at the time, so I didn't make much use of it. As it stands, I've built all of this stuff very much along Cope's ideas, laid out in "Computer Models of Musical Creativity". So it's not too practical for me to totally switch gears into JMSL's objects. My VoiceSegment object is a java class, with quite a bit of detailed data, and references to many elements of the musical stucture of the analysed material, so I can't see myself dropping the model any time soon. 
However, I will definitely look into some of the properties you mention, as they will probably help me generate greater variety, with appropriate restrictions. Cope's approach is tied in with voice leading on its lowest levels, so my data structure also works on this basic foundation. And in this regard, my current code actually works very well. I can add a number of different works to the database, and the system will navigate convincing transitions from one source work to another. The problem is that all of the *vertical* material is always from the same source work, which means I'm basically only getting a sort of medley out of it... which is obviously not what I'm after! ;-)

You've given me a lot to look into, so I'll chew on this for a couple of days, then get back to you. 

I should mention, just to be clear, that the limitations in my code not drawing on enough variety in the source works are due to problems in my own design, not anything in Cope. It's part of the hybrid/realtime aspect of my model, which is admittedly still in its early stages. I approached certain things very differently than Cope, and this is where my system is getting into some trouble - a realtime system being glued into an essentially non-realtime system... I'll work it out, though. Eventually.

Typically when you embed a lower dimensional space in a higher one the

padded values are 0. For 2D, this would be a place going through the

origin of a 3D space in the XY plane. If you were to truncate, you

would be orthogonally projecting 3D points into the XY plane.

> Actually, if anyone has anything more to add, I'd still appreciate any thoughts.

> My question before was really whether it was a better approach to "pad" the lower-dimensional array, or to truncate the higher-dimensional array. I understand that the value of any padding would be arbitrary, however, what's not clear still is whether truncating, or some form of dimension reduction would be a better approach. The main reason I ask is because, in experimenting with the results, they are really quite different. Thinking about it now, I actually kind of feel inclined to truncate dimensions on the higher-dimensional array, since, if I again use 2D and 3D spaces as an example, it seems to makes more sense to reduce a 3D point to its 2D projection than to give an arbitrary z position for a 2D point. Yes? No?

> (I'd imagine that, for the geometrically-inclined, this is a bit like understanding why the first black key above C is sometimes a C# and sometimes a Db...)

> thanks in advance for any further thoughts,

Typically when you embed a lower dimensional space in a higher one the
padded values are 0.  For 2D, this would be a place going through the
origin of a 3D space in the XY plane.  If you were to truncate, you
would be orthogonally projecting 3D points into the XY plane.

On 6/29/07, jbmaxwell  wrote:
>
> Actually, if anyone has anything more to add, I'd still appreciate any thoughts.
>
> My question before was really whether it was a better approach to "pad" the lower-dimensional array, or to truncate the higher-dimensional array. I understand that the value of any padding would be arbitrary, however, what's not clear still is whether truncating, or some form of dimension reduction would be a better approach. The main reason I ask is because, in experimenting with the results, they are really quite different. Thinking about it now, I actually kind of feel inclined to truncate dimensions on the higher-dimensional array, since, if I again use 2D and 3D spaces as an example, it seems to makes more sense to reduce a 3D point to its 2D projection than to give an arbitrary z position for a 2D point. Yes? No?
> (I'd imagine that, for the geometrically-inclined, this is a bit like understanding why the first black key above C is sometimes a C# and sometimes a Db...)
>
> thanks in advance for any further thoughts,
>
> J.
>


Okay, I've had some to look over this message in greater detail...

Yes, this could be handy. I'll implement a chroma vector method in my VoiceSegment class - should be pretty easy, and could provide some valuable info! The one thing I'm not sure about is how to deal with less 'square' durations - I'm assuming there's nothing wrong with using doubles for the actual values, and thus representing duration more precisely?

I've found some open source java code for getting statistical info from double[]s, so I'll implement some of the above properties right away. I've actually previously done some work with these in Max, using the free versions of the Litter objects (is it lp.stacey...??), which I found quite effective, though I think I'll get more use from them now, as the overall design of my app provides a better context for taking advantage of the info.

Euclidean distance with different dimensionalities