Subpixel Motion Tracking: Methods, Accuracy, and Application to Video of Collapsing Buildings

Mick, OWE, planning to experiment with video-recorded real falls?
First, I have to get out of calibration wonderland. I fell down this rabbit hole yesterday. Do you realize how hard it is to make a perfect circle? I didn't. This is as close as I can get:



But, if you do anything to it, and I mean anything at all (including just resize/resample), it's not perfect anymore. Inkscape made this, I've yet to find anything that can honestly resize it. GIMP does the best, but it's not quite as good as direct subpixel output from Inkscape (which is not perfect). Move the 1000px radius circle one pixel to the side and reduce by 10x, expecting a good symmetrical anti-aliased edge, and NO.
 
What would you expect? It's not going to be symmetrical unless it falls on a 5 pixel boundary in x and y
 
If I respond too quickly, I'll probably say something stupid.

All I'll say at this point is some apps do a better job of the same thing, so while I might not expect true perfection, I'm hoping for something that doesn't have more error in the generation than that which attempts to track it. As you know, the method I'm using is "perfect" in the sense that it will produce an objective weighted mean location, so any asymmetry shows up accordingly. Depending on what I use to transform an image, I can have many times the theoretical measurement error simply as a munged figure.

Since some things are better than others, I know a large proportion of issues are in the painting algorithms. For instance, I discovered last night that ImageMagick doesn't draw circles where you tell it to. There was an error in my originally posted code which was that I forgot to offset the coordinates by (0.5,0.5) to get centered in the pixel. After correcting the error, I discovered that all of the circles I was drawing were still off by 0.05px in both dimensions. It's ImageMagick. That's too damn much error at the outset to do a calibration exercise!

Are you saying I should start with an odd pixel diameter?
 
Last edited:
In this sequence, the 1000px circle is moved right one pixel at a time until it's gone 10px, and then it's shifted downward one at a time until it's moved 10 down. The images are reduced 10x and then measured. Ideally, the circle's center will start at (200,200) and end up at (201,201) with independent motion in each direction in 0.1px increments.

First, IrfanView Lanczos resample:

Code:
f,x,y
0,199.707105531914,200.292973951949
1,199.807007745355,200.292889163487
2,199.906859925671,200.292883672963
3,200.006580827411,200.292905055813
4,200.10636782631,200.292946258729
5,200.206246866803,200.293011430027
6,200.306147264837,200.292972480204
7,200.405856167037,200.292893405971
8,200.505908464426,200.292955780583
9,200.60574777589,200.293110760019
10,200.705655638327,200.292919286814
11,200.705525856475,200.193023953865
12,200.705636122192,200.09314818716
13,200.705589304553,199.993493464299
14,200.705553027094,199.89365891906
15,200.705582442364,199.79366477708
16,200.705592241964,199.693923118596
17,200.705576710213,199.594061531995
18,200.705584114068,199.494082655072
19,200.705482780241,199.394197544664
20,200.705634510123,199.294299399441

What a mess. It does move 0.1px with error generally less than 0.001px - yay - except it starts life a whopping ~0.3px off center. Is that a necessary consequence of the pixel geometry? No; let's see how well GIMP does. I only did the first five points, but you can see the striking difference:

Code:
0,199.999994243642,200.000021774049
1,200.100804109822,200.000016455607
2,200.200572652314,200.000015954629
3,200.300181466651,199.999969405188
4,200.399503322632,199.999955891776
5,200.500532551324,199.999958832381

One might think my method has a 0.3px tracking error offset until seeing that. But, notice that even GIMP can't keep from influencing the other coordinate while shifting in the one dimension. Is that crosstalk necessary? No. Let's see what happens when I use Inkscape's rendering engine to do the same thing with direct subpixel drawing at image size 400x400, moving the circle 0.1px at a time:


Code:
0,200.000024624969,200.0
1,200.100433838503,200.0
2,200.200138719,200.0
3,200.299840906881,200.0
4,200.39966725915,200.0

Y coordinate rock-steady at exactly 200px, which is what I expected from the two above. See what I'm talking about now? Disparity and error in rendering.
 
I was assuming 1000 pixel diameter. And if you are reducing this 10x then you need to be either on the edge of a 10 pixel block, or in the middle, for it to reman symmetrical (assuming it starts symmetrical).

I'm a little unsure of what you are trying to do. You know what your center-finding algorithm does, so what are you trying to prove? The circle drawing programs are designed to look nice, not produce perfect circles. So you might be starting with something a bit off. If you want to force pixel symmetry, then mirror a quadrant.
 
You probably need to use a simpler filter when resizing down. But this all seems rather academic, as you already know the limits. Are you trying to determine something? Is there something in dispute here?
 
I was assuming 1000 pixel diameter.
Sorry, I keep referring to it as a 1000px circle, but that is the radius.

And if you are reducing this 10x then you need to be either on the edge of a 10 pixel block, or in the middle, for it to reman symmetrical (assuming it starts symmetrical).
Or just draw subpixel motion directly in Inkscape, which does what I expect to within a high degree of accuracy.

I'm a little unsure of what you are trying to do.
So am I at this point.

You know what your center-finding algorithm does, so what are you trying to prove?
I'm just trying to make good knowns to use for examples of what happens when various types of noise and error are introduced, and show exactly how resolution changes with dynamic range and total pixel count. A sensitivity analysis, not really proving anything.

The circle drawing programs are designed to look nice, not produce perfect circles. So you might be starting with something a bit off. If you want to force pixel symmetry, then mirror a quadrant.
Brilliant!

What is clear is that running my method on femr2's example shows EXACTLY how out-of-round his test example is, not how well the method works at measuring it. Which I knew anyway, on some level.
 
I am reading and looking at pictures with interest, and almost entertained.

<link to Goodfellas extract snipped; you know which one it is if you've seen the movie>

... that measuring multi-pixels tends to improve that (though not clear yet by how much - is that a 1/n or 1/ln(n) or 1/sqrt(n) factor).
This is one of the things I wanted to hit on with demonstrations starting from well-characterized knowns. I don't have the chops - or maybe I do but am too lazy - to proceed with this analytically*, so concrete examples are the way to go. I thought that would be easier than it is. So far, the best accuracy I can get drawing the test object is less than some of the effects I want to demonstrate.

That, in itself, says quite a bit.

One of the things it says is that sub-pixel measurements should not be the long pole in the tent when it comes to suspect factors. That's probably beating a dead horse with most everyone following the discussion, but it's amazing how many fresh horses I've had to kill over the years. Like, every time the subject comes up. Not every time does the general audience come away as believers; it's about 50-50, and that includes some otherwise sharp people.

The people who grasp the concepts immediately don't usually take long to start questioning the need, especially given the lousy quality of most of the source videos against this alleged accuracy. Also questioning the objective; why bother? Well, we got here by way of questioning the validity of alleged over-g findings, and one of my points has been that the tracing methods employed by achimspok and femr2 (now also Mick) are the best shot at determining this. It's making the most out of the information available. We've already seen how club-footed some of NIST's methods were, why use that?

One of my other points, however, is that this goal might be out of reach, period. That SE and similar programs aren't capable of resolving this from the source material. The greatest strength of these methods is in multisampling, such that quasi-static motion like the camera tripod drift is easily resolved. Grainy, noisy video is not a problem if true net motion is some small distance over a very long time. With rapid target motion, you get one shot at acquiring it in that position.


* A one pixel object has 256 possible intensity states, as we've discussed. Adding one more pixel gives 256*256 states, and so on... 256^N states. Once scaled to the working metric, this translates to greater and greater filling of the number line. That's the easy part but irrelevant since only a small fraction of the states are accessible due to a literally infinite number of constraints which can be imposed by real world examples. Totally context dependent.
 
Last edited:
What follows will be pedantic to at least Mick and perhaps many more, but hopefully not for everyone reading. The purpose is to demonstrate the accuracy of the method I'm using and show how large numbers of pixels can potentially give greater resolution than one part in 256.

The test example uses 1000 pixel radius white circle posted above. The sequence consists of three frames, the first is the posted image. This image was created in Inkscape and rendered by its engine with antialiasing. My method places it as perfectly centered at coordinates 2000, 2000. The subsequent frames were made by editing this image in GIMP at the pixel level.

The second frame differs by one pixel's intensity, the third again by another pixel's intensity. The increment is as small as possible, the chosen pixels go between 1 and 0 intensities. The first pixel to change is the rightmost in the top row from (1,1,1) to (0,0,0), then a corresponding pixel in the lower left transitions from black to (1,1,1). This would be like the object shifting infinitesimally to the left and down.

The tracking:
Code:
0,2000.00000088487,2000.0,3145792
1,2000.00000083058,2000.00000124743,3145791
2,2000.00000075133,2000.00000249361,3145792

Discrimination as low as 0.00000005 delta value. This is why I call the method perfect. It's as close to perfect as one can get. It's not very good for general purpose tracking in the real world, but it is exact to the degree of floating point accuracy. It relies on a set of assumptions about the invariance of the projected luminosity, but then so does pretty much every naive tracking algorithm, I would think.

While very unwieldy and many times unusable in real world problems, this method is the reference standard for calibration tests. For an example such as the one femr2 presented, if two methods are to be compared, how does one decide which is more accurate? By comparing both of them to this.

The other objective, showing how numbers matter: we know if the circle was 1 pixel in size, the smallest transition detected would be 1/256. The last column in the data above is count of non-black pixels, very large. Goes down by one and then back up.
 
Last edited:
OWE, you're a funny guy.


What follows will be pedantic to at least Mick and perhaps many more, but hopefully not for everyone reading. The purpose is to demonstrate the accuracy of the method I'm using and show how large numbers of pixels can potentially give greater resolution than one part in 256.

Not to me. Thanks.



Discrimination as low as 0.00000005 delta value.

Wow. That is way over the top.

While very unwieldy and many times unusable in real world problems, this method is the reference standard for calibration tests. For an example such as the one femr2 presented, if two methods are to be compared, how does one decide which is more accurate? By comparing both of them to this.

Nice. That applies to every tracing technique. A standard test in this direction is the way to go.

How accuracy depends on or varies with velocity would be one more calibration test.


Application to real video is also where a tracing technique can also be put to the test. I recommend Sauret WTC1 antenna and NW corner washer for that. That is the biggest test of displacement tracing to the WTC towers. What tracing technique can detect displacement back to the camera shake when compared to a static point?

>>>>>>>>>


People may be asking what the applications are to WTC behavior and debunking? Or 'truthering'? As an example, I can debunk the NIST collapse initiation scenario for WTC1 (and probably WTC2) using these tools and mappings of sequenced detailed observations. And they invented the most accurate clocks existing! From the Wikipedia entry 'atomic clock':

External source:

In March 2008, physicists at NIST described a quantum logic clock based on individual ions of beryllium and aluminium. This clock was compared to NIST's mercury ion clock. These were the most accurate clocks that had been constructed, with neither clock gaining nor losing time at a rate that would exceed a second in over a billion years.[30] In February 2010, NIST physicists described a second, enhanced version of the quantum logic clock based on individual ions of magnesium and aluminium. Considered the world's most precise clock, it offers more than twice the precision of the original.


They are good with clocks. They did a poor job with the WTC1 initiation movement and initiation mechanism, and anyone can verify this using the tools available, the ones being developed here, and access to the visual record.
 
What follows will be pedantic to at least Mick and perhaps many more, but hopefully not for everyone reading.
You haven't lost me - yet. And you, OWE, are aware of my focus of interests - more towards:
A) numeric/quantified results which are valid and sufficiently accurate for the pragmatic purpose facing me. AND
B) Application of such quantified results where they are needed to answer real event questions - usually about the WTC 9/11 collapses.

(It's a derail here but I find far more errors occur in the qualified analyses - as basic as defining WTF we are taliking about - specifying the problem. It still astonishes me - 1941 vintage 1965 slide rule era engineer - how many "quantifier" types - engineers esp - rush into the FEA and calcs before they know what they are measuring. One T Sz must set the record for always getting the starting premises wrong. But the number of debunker side types who blindly chase the truthers down the wrong path is more than those who stop and think before they engage the calculator/FEA. I'm sure it wasn't as bad in slide rule era but that is a derail too far.)

I respect - admire? - those who are interested in pursuing such detail - it is vital to a "manager's perspective" person like me that accurate reliable technical staff work be available. Hence my long term relationship with femr2 on several forums. I long ago came to accept his work on face value and have never been let down. I don't need to do the work less adequately myself.

Remaining details fully understood and appreciated:
The purpose is to demonstrate the accuracy of the method I'm using and show how large numbers of pixels can potentially give greater resolution than one part in 256......It relies on a set of assumptions about the invariance of the projected luminosity, but then so does pretty much every naive tracking algorithm, I would think.

While very unwieldy and many times unusable in real world problems,...
:rolleyes:

...this method is the reference standard for calibration tests. For an example such as the one femr2 presented, if two methods are to be compared, how does one decide which is more accurate? By comparing both of them to this.

The other objective, showing how numbers matter: we know if the circle was 1 pixel in size, the smallest transition detected would be 1/256. The last column in the data above is count of non-black pixels, very large. Goes down by one and then back up.
 
With a couple of tweaks, good enough for the antenna dishes ;)
The serious reality is that - in a lot of situations "near enough" is more than "good enough".

When I entered internet debate it was to explain the "progression" stage of Twin Towers collapses - where I was on a niche forum and I was on my lonesome - the only engineer among a lot of biological scientists.

Overall at that time there were very few voices speaking out against the tide of Bazantian misapplications. And it has not changed a lot. In recent days I've copped several personal attacks from out of their depth debunkers grossly offended that I dared disagree with them and to criticise Bazant. And saddest part from my perspective the rest of the mob not game to speak up in support.

The point of relevance here is:
Once you (I, we) get the mechanism - the "qualified" bit right--

--it only needs order of magnitude ball park figures to get the "quantification" right.

Because the available energies were orders of magnitude more than needed for collapse. And when you have 50 or at least 30 times more than you need for failure it matters little if it is actually 49 or 29 or even 15 or even 5 tho less than that I would have had to get more serious.

Then - a couple of years later circa 2009-2010 - I learned about some other folk calling it ROOSD and the rest is history. I occasionally regret not having a catchy "brand" acronym. I'm an engineer - not a marketer.
 
While very unwieldy and many times unusable in real world problems, this method is the reference standard for calibration tests. For an example such as the one femr2 presented, if two methods are to be compared, how does one decide which is more accurate? By comparing both of them to this.

I disagree. Your method is simply tracking the average position of the pixels in the region of an image, weighted by luminosity. This works great for a nice blob on a featureless background, but it's not really relevant when a larger object is moving - especially if there are variations in lighting, and debris, smoke and suchlike.

The more complex methods like in After Effects (and presumably Syntheyes) work by forming a subpixel resolution grid of peaks and valleys of various functions of the target object, and then searches for the best match of those grid within a search area relative to where it was the previous frame.


Source: http://www-cvr.ai.uiuc.edu/~seth/ResPages/pdfs/NicHut02.pdf

This works well for real-world tracking, however it is not necessarily the best way for your test case.
 
I disagree.
Note my qualifier 'For an example such as the one femr2 presented...". That might make it seem worthless, but a) femr2 did throw such a thing down as an example for tracking accuracy and b) there are real world objects to track which correspond better to this example than does the wrench head above. SE and AE ought to be able to track simple targets reliably, too.

Your method is simply tracking the average position of the pixels in the region of an image, weighted by luminosity.
Yes, assuming invariances as mentioned above, it's very precise.

This works great for a nice blob on a featureless background,...
There are often ways to get the background out, leaving only the blob.

...but it's not really relevant when a larger object is moving...
I agree up to a point. If the object moves enough to change its appearance, there can be corresponding shifts in the calculated mean. Exclude significant motion blur. At this point, it's simply another measurement, as accurate as a static one. You'll only get one before it moves on, but SE and AE face the same situation. For the case of very slow motion on a restricted set of targets, I don't know this would fare badly against those programs; I don't believe it would.

...especially if there are variations in lighting, and debris, smoke and suchlike.
Well, you got me there, I have to agree. That's why I included that qualifier, and why I pick my targets very carefully. No doubt the intelligence behind SE/AE algorithms allow them to discern position where a simple method like this will be all over the place or have no hope of even tracking in a meaningful way. But how much uncertainty would there be in such a measurement?

Even so, things like global illumination change are not much of a problem. The transition is, so being in the shadow of swirling smoke is a problem for a small spot. Less so when you have several small spots to average. Debris I avoid. Unless it's a dark blob being tracked against a blue sky...
 
But, let me concede the point. I don't disagree that SE/AE will do better on a large variety of real world cases and handles complexity much better. Where does that leave us?

What, instead, would serve as a benchmark, if simple tests are not acceptable? We'll get really precise-looking measurements and a theoretical assurance of best fit on a complex scene, but how to know what the actual accuracy is?

So far, we've seen that neither SE nor AE do as well on femr2's example as my (totally naked, no-enhancements-whatsoever) routine. This is not only the simplest sort of track imaginable, we know what the answer is supposed to be. We've seen that AE is quite sensitive in the simple test to operator input. Why is this so? It has to find the object. What could be simpler to find than a blob on a black background? We may assume that it's not so in more complex scenes but that would currently be nothing more than an assumption. How do we know?

It does suggest that a real-world heavy ball drop is the way to go. Then we'll try a vibrating tuning fork and see if I beat AE again.
 
What, instead, would serve as a benchmark, if simple tests are not acceptable? We'll get really precise-looking measurements and a theoretical assurance of best fit on a complex scene, but how to know what the actual accuracy is?
$AU0.02.
A benchmark has to be replicable and I think that implies simple. Or a compound of several simple features summed. Isn't one of the issues "How do you clearly measure something through fog?" When the answer IMO is "you cannot rely on one measurement..."

...ignore me if I've totally lost the plot...

....in the same fog???
 
The benchmark could be several objects starting from dead-simple disk on uniform backgroud, continuing with something like a cross, then perhaps an "E" shaped thing; have them rotate. Etc.
And keeping in mind the errors that graphics programs make in creating the images. Phew!

Then go on with heavy ball. If only you could measure its size and motion with floating point accuracy... ^^

Anyway, I am more than convinced now that the subpixel methods have the potential to measure 1/200 pixel or better, given a good multi-pixel target. I suppose they thus can be expected to measure noise on that level on real-world video - much from the artifacts of the several steps of image processing done before the final video output.
AND it appears to me that, since we'll never see any video without some steps of processing, this noise tends to be wors than 1/100 pixels, and easily reaches 1/10 px - this is judging from your early tries at re-creating circular motion.

In short: Methods themselves are good enough (and near enough) - their inherent error margin by at least an order of magnitude smaller than artifact noise in the the video signal.
 
A defined unchanging object moving against a variable background could be a classic benchmark.

Although here we are generally interested in falling buildings, the tracking of which has two cases:

1) Tracking the edge or corner, where you do get a variable background.


2) Tracking a feature within the building, where the background does not change


In both instances you have to deal with the deformation of the building, lighting changes, and dust and debris, along with any partial foreground occlusion.


AE manages to see through the dust up to a point in both cases.



Here a poor choice of initial point causes a jump to an adjacent window:


I think artificial test cases are always going to diverge at some point from real-world test cases, as the algorithms in use in the real world contain heuristics based on assumptions about sources of noise and distortion that are not present in the artificial cases, so it's constantly trying to correct for something that isn't there.
 
Attached is a relatively clean real world test case of a a heavy key falling, taken from this video:


AE tracked the corner fine until this point:


And a single interior feature up to here:


And two features account for rotation:
 

Attachments

  • Keg Drop.mov
    4.3 MB · Views: 879
But, let me concede the point. I don't disagree that SE/AE will do better on a large variety of real world cases and handles complexity much better. Where does that leave us?

What, instead, would serve as a benchmark, if simple tests are not acceptable? We'll get really precise-looking measurements and a theoretical assurance of best fit on a complex scene, but how to know what the actual accuracy is?

That is why there is no single benchmark. A small series of tests are necessary, and I think the tests to use have already been mentioned in different posts. Both issues must be addressed in the test series:

1) If you do not test the tool on movement you know beforehand, you will not know the limits of the tool or if the tool is giving a false sense of accuracy.

2) If you do not test it in real world conditions, you do not know how variable conditions affect the tool.


So Mick and OWE are both right. A small series of test conditions will not be an unnecessary burden. The whole point is to gain confidence in the tool and to not be fooled by the results: to remove doubt. To the degree there is doubt as we progress, we'll simply introduce a new test to challenge the doubts as they arise.

>>>>>>>>>>

Remember, most all applications of the tool wrt the WTC towers are toward detecting displacement only. When going for velocity, acceleration, or 'jolt' detection, we'll probably fit known functions to the displacement data and differentiate the functions, along with using moving point averages, so detection of displacement accurately is the true test of the tool.

The application is not just for falling buildings, but creep movement also. Conditions of slight displacement relative to static points. Also velocity dependence on displacement detection accuracy is important to know when working with faster moving objects.
 
Last edited:
Interesting. Good stuff.

Before I jump in with both feet, some scattered thoughts:

1) Thus far, I've utilized a mean location based only on grayscale luminousity. This is for several reasons; easy to do, easy to grasp, and "perfect" for the contrived examples we started with. I've also operated on entire images since there's only one object of interest in the scene. If I'm going to start doing real-world measurements, there's no way I'm going to stay restricted to that specific scheme.

Luminousity is just one way to weight location. It was the appropriate way to deal with a grayscale image with a light blob, so that's what I did. My opinion is that simple, well-defined benchmarks are the best to establish performance metrics, and I stand by that. Just so happens AE can't seem to do that so well (still wonder why). So luminous mean is all I've been talking about until now, but that's not the only trick in the bag. No, I'm not changing the rules, just moving from a specific case to general.

The real rule is weighting pixel location by confidence. For the prior examples, confidence is given by intensity. If that doesn't work for a given case, there are other choices. There's intensity in each channel, proximity to a color or set of colors, equal weighting, non-linear scaling, etc. Weights can be drawn from multiple sources, including the bounding box of the object as the most primitive element.

The other thing is feature isolation. Obviously I'm not going to be doing whole images most of the time, it will be cropped to an area of interest at the very least. Once again, if anyone thinks that's as far as I can go, or will go - wrong. This is not entirely independent of the first rule in that inclusion/exclusion of pixels is really nonzero/zero confidence levels, although practically I'm referring to preprocessing to eliminate pixels from consideration before later stages are executed.

Any technique to find and select pixels is open to me; if it can be expressed in rules, and even if a lot of manual work is required to provide assistance and coddling to the routines, it's fair game. Two things come to mind:

- At some point of complexity, it may begin to resemble what AE does more closely. Yes and no. It will never be AE's method, since I'm not matching contours on a grid, but it's not just luminous mean, either.
- The high degree of customization and myriad paths open to give differing results means that this is a somewhat subjective process. Okay. It is documentable and repeatable, but the exact methodology may differ from case to case. Whereas, the only subjectivity with AE is the initial placement of the tracking box (which we've already seen can make a big difference). I restricted the method to naive luminous center for the simple examples - which I still feel are appropriate - and that is as objective and transparent as it can get.

All of the "subjectivity" of the "intelligent" processing of AE/SE is rolled opaquely into the code by the coders. I am the coder, and my methods will be transparent.

2) What is a feature? Mick breaks it down into edge/corner and a "feature" within the building. That's fine. I prefer to think of the categories as

a) blob: isolated object, generally convex
b) edge: steep color gradient making a line
c) corner: intersection of two edges
d) compound: object composed of any mix of the previous three more atomic elements
e) complex: pattern or shape which can't be decomposed into first three elements

Each of our definitions seem to be biased by the tools we use. For AE, it's point-click-run. I need to consider what topology is involved in the feature in order to select a method for processing.

Item (a) is my preferred target because it's easiest to isolate and provides x,y coordinates directly. Only if I can't find a blob will I consider other categories. That's not very restrictive at all. I claim that most tracking can be reduced to blobs. I went through a CDI video earlier and many of the CD'd buildings had viable blob targets. Windows can be blobs! And much else.

Item(b) - We haven't touched on using the 1D version of my method for edges, but it works exceedingly well (ref. replication of NIST fine motion). It's very easy to apply but post-processing can be extra work because it only measures the crossing point in every pixel slice, not a particular point on an edge (that's a blob!). But I don't think AE can do that, either. For mixed motion direction, I have to run a pass in both dimensions. Then I have the whole edge as a collection of points, which can be regressed to a line or fitted to a path.

Item (c) doesn't really exist as a feature to me. I mention it only for... I don't know why. It really belongs in (d), it's an intersection of two lines.

Item (d) - Since the object can be decomposed into more primitive elements, it may be possible to do multiple passes of 1D slices to grab points, but I think I've only done that once, too much hassle. Mostly I'll pass on these if they can't be treated as a blob. Probably these are good targets for AE/SE.

Item (e) I won't touch. This is undoubtedly where AE/SE will shine. There is the matter of how much you can trust the accuracy, since there's no way to cross check a complex target, but AI tracking is all there is to do. Best tool for the job.

Blobs are quite plentiful.


3) My way is a lot of work. That's why I don't do much of it, and I don't plan on doing a whole lot here. I will never quibble with the fact that AE/SE are:

- much easier to use
- handle a much greater range of features
- should be more accurate on complex objects
- have a measure of objectivity and foundation in peer-reviewed research

No one should reach for my ways as a first resort, but then that was never my claim. My claim was that my method (specifically luminous mean) was the appropriate benchmark for accuracy in examples like femr2's and, in that situation, it clearly WAS. Arguing something else is something else. Mick's point that it isn't a fair test of the total capability of AE, or its accuracy with respect to luminous mean in real world contexts, is absolutely true. But that isn't what I said, and I suspect we wouldn't have veered in this direction had AE done a better job with the simple example.

Since we have veered in this direction, I'm game for doing more experimentation with the caveat of #1 - the definition of confidence for weighting is wide open - but my way is a lot of work. Therefore, I hope that explanations will suffice in as many cases as possible, in lieu of actually doing it and presenting.
 
Last edited:
But that isn't what I said, and I suspect we wouldn't have veered in this direction had AE done a better job with the simple example.

And I'm not convinced it can't do a better job, it may well be user error on my part - I only tried twice, and the second attempt was about 10x as good as the first.
 
Feature Isolation (part 1)

In the simple examples, no feature isolation was done. The target was the whole image because there was a non-black blob on a black background. Real targets never conform to this ideal. Not only is the range of channel intensity differential less between target and background, there's variability over time and other possible commonalities between desired and undesired pixels.

My methods require feature isolation from the background. The more clean and distinct the separation, the better the result. To a certain extent, this must also be true of the more sophisticated tracking apps. I've seen that femr2 couldn't get SE to latch on to just any arbitrary area, and in one of Mick's examples AE decided that another window was a better match than the original one. This makes sense; there must be a way to distinguish a feature or it's not really a feature at all.

As the ability to discriminate between feature and not-feature diminishes, the confidence level must necessarily decrease regardless of method used for matching and placement. For any tracking application, there is a range of contexts in which tracking is reliable, and these constitute a "good targets". My good targets will never be as numerous as what AE/SE affords. Conversely, it's possible for a custom solution to acquire a target the canned apps can't. I know of at least one case, where SE couldn't acquire the WTC1 roofline but I was able to draw a pixel line.

So, blah, blah, what does this boil down to? Examples will come shortly but the executive summary is:

Feature isolation is the most important part of the process for me. If the pixels selected to represent the feature are chosen well, simply doing an unweighted sum of positions will give a subpixel mean geometric location. This is good enough! Right? Especially if it's a lot of pixels.
 
And I'm not convinced it can't do a better job, it may well be user error on my part - I only tried twice, and the second attempt was about 10x as good as the first.
It probably can do better. May I venture a guess as to why it behaved as it did, given the clue of operator sensitivity?

It does such a fantastic job of tracking patterns that it's looking for the best match of its training set which is a very bumpy contour and nothing else. It doesn't know anything about disks, it's looking for this complex globular cluster miasma but the conceptual objective on our minds is "find the center of the circle". So it's very accurately giving you what it thinks you want - namely, the best fit mean location of a complicated object, which is not the center of the circle. The closer the tracking point is to the "center of the circle" (purely an abstraction for any given frame), the better it does, but there is no center.

Having it look for corners or patterns means performance will come more in line with expectation. A question I have is whether it does better with little blobs in a complex scene than it does with just a little blob by itself. It might.
 
Last edited:
Please understand that I've never questioned the value and accuracy of the AE class tools. Until I took up the issue of prefiltering with femr2/Major_Tom and provided a less jittery curve with raw data than the big boys, my only opinion was that they were better in every way except for a very few more esoteric acquisitions. I would not have expected your attempt with AE to be as sensitive to your placement as it was, but I think I may have hit on why above.*

Which means in this one specific situation, my way is better, because we DID want the center of the circle. That is, more or less, the objective of any small blob tracking. No one's interested in whether it rotated a little because it's one step away from an abstract point. Likewise, extreme accuracy doesn't really matter; this would be tracking a blob over some distance with noise, good enough is good enough. As I stated for using equal weights instead of luminosity, center of patch is good enough.



* and this may also explain why femr2 found it necessary to prefilter and upscale ;)
 
I do enjoy discussing and working with this subject, that's how I got so sucked in before. The process is more interesting than the data obtained. Here I go again, already a backlog of stuff to present. It will have to dribble in as time permits.

Just realized that my method is the ultimate pre-filtering, haha! SE/AE are so good they perform better on small blobs when smoothed, but mine is inherently smoothed.
 
It probably can do better. May I venture a guess as to why it behaved as it did, given the clue of operator sensitivity?

It does such a fantastic job of tracking patterns that it's looking for the best match of its training set which is a very bumpy contour and nothing else. It doesn't know anything about disks, it's looking for this complex globular cluster miasma but the conceptual objective on our minds is "find the center of the circle". So it's very accurately giving you what it thinks you want - namely, the best fit mean location of a complicated object, which is not the center of the circle. The closer the tracking point is to the "center of the circle" (purely an abstraction for any given frame), the better it does, but there is no center.

Having it look for corners or patterns means performance will come more in line with expectation. A question I have is whether it does better with little blobs in a complex scene than it does with just a little blob by itself. It might.

Relatively speaking it will, as just averaging will not work in a complex scene.

One can easily see why the feature matching approach will not work that well in this contrived case. The first frame is:


The second is:


The actual position is encoded in the average of all the pixels, but AE is trying to match the second image to the first image, which it does reasonably well, but not as well as the simple average.
 
Just lost a post. First time here. Ah, it was long-winded rot anyway. The gist was that I think the question of why femr2 filtered is answered. He was interested in maximum accuracy for blobs and needed to mask the detail from the awesome pattern matching algorithms, blobs being essentially point objects. Makes perfect sense. You don't want to filter complex objects, but complex scenes are not an issue because cropping eliminates that.

femr2 would zealously crop to the object of interest. If that was a viable blob, he'd get the best results by having SE ignore the extreme detail, which is all artifact. My method intrinsically minimizes random variation about the mean for blobs. Little blobs are a subset of viable targets but are hugely represented in general tasks. femr2 indeed did due diligence and describes an optimum method for small blob tracking.
 
Last edited:
Just lost a post. First time here.

Copy to clipboard before hitting the 'post reply' button as a habit. Trust in the technology is bad in this case. You'll loose more if you don't.


A bit of irony there in you losing a post. Your posting sequence is great because you do not trust the prepackaged software, which is good. Then you trust the forum prepackaged software. Funny.


The issue here is proprietary software. It leaves you guessing because you cannot examine the source code directly. We can guess. We have access to tutorials. But in the world of ownership and hidden source codes, that is as close as you'll be allowed to get.

Is your Windows or Apple OS spying on you? Hope not, but you aren't allowed to know for sure without access to the OS source code. Like that. You don't really own your OS. It owns you.


Your custom method is open source, so you know exactly what it is doing. You can adjust it in specific cases. We can't. That is one reason these tests are so interesting. You are testing the faith and limits in prepackaged software. For that reason your contributions are essential. I'll be keeping notes.
 
But does it actually do anything in anything that's real-world?
Yes, it's done several real world measurements. One set was used in a paper David Benson prepared in a submission to JEM. Another set was used by Charles Beck in a paper you'll find on Arxiv. Frank Greening did informal analysis using several datasets I've produced. You've seen one; it was as good as AE. I replicated NIST's measurement of precollapse sway. I could've done that section of their report using these methods, and it would've been better.

You are finding the average point in a region. How does that track anything real?
It does. Read my posts.
 
Last edited:
Just lost a post. First time here. Ah, it was long-winded rot anyway.
Copy to clipboard before hitting the 'post reply' button as a habit. Trust in the technology is bad in this case. You'll loose more if you don't...
I use the "Lazarus" (as in "raised from the dead") add in with Firefox. Has saved many of my most brilliant compositions from loss due to congenital stupidity of operator.....:rolleyes:
 
But does it actually do anything in anything that's real-world? You are finding the average point in a region. How does that track anything real?


That is part of the testing sequence. To the degree that it doesn't, we'll see his method run into problems compared to others. It will all come out in the wash.
 
It does. Read my posts.

Perhaps an example? I'm interested because I'm not sure of the precise practical application..

It would seem to me that it is limited to tracking a feature over a background only if:

1) The total weight of the background is unchanged (realistically meaning it's flat, or finely textured)
2) The total weight of the feature is unchanged (meaning it's always within the analyzed region)

(Where "weight" is the sum of the pixel values)
 
Back
Top