A review of the state of play as seen by A. S. Bennett, 2008 Feb 13.
Nasty Mathematics
The Proof of the Pudding
Pictures
Photometric Conclusions
Positions
Thoughts
The standard TASS Pipeline has been subject to repeated tests against alternative methods of analysis. In spite of claims to the contrary (especially those by the present author,) no other method has provided better photometry. The user is warned again that the accuracy of magnitude measurement, determined by confusion errors, is a good deal worse than the short term internal scatter of the measurements. The astrometry of the Pipeline measurements could be improved by about a factor of two if somebody thought this was worth the effort.
If the sky and the PSF of the telescope remain constant, the confusion noise for a particular star remains constant. Unfortunately, the PSF depends on the position on the image, the focus of the telescope and on the seeing and hence varies from one observation to another. The sky also varies: objects, man made and otherwise, move around as well as the variation in brightness of stars which have formed the main object of TASS investigations. In the short term, it may be reasonable to assume that the confusion error remains constant and that variations reflect real variations in star brightness. In the long term, this simply isn't so and the errors in magnitude determination would be expected to come much closer to the estimated confusion noise than to the smaller random measuring noise. In this note, I reanalyze an old TASS data set that was taken in only a few hours on one night so some of the sources of variability are minimized. It turns out that the internal scatter of the magnitude determinations is close to that estimated from the measuring noise apart from a noise floor afflicting the brighter stars. This error is much lower than my estimated confusion error. The purpose of this note is to point out that precision is not the same thing as accuracy: the relatively low internal scatter observed on one night does not imply the same low scatter between observations made under different conditions with different seeing and different focus.
Properly, the confusing sources ought to be characterized by the variance contributed by the photons (or joules if you prefer; I like photons) per second in unit bandwidth and per unit solid angle, thus avoiding the parameters of the specific measuring instrument. I propose to be as lazy as most astronomers and start with ADU2 within the passband per pixel. Worse, I will scale the ADU measurement by the measured standard deviation so that I can compare the effect of confusion directly with the measuring noise. To convert my figures to something independent of the TASS MKIV instrument, you need to know the value I used for the standard deviation, the gain, the quantum efficiency and the exposure time to convert to photons per second. You need the spectral response to convert that to unit bandwidth and you need the CCD pixel size and telescope focal length to convert to unit solid angle. The PSF comes into it too. It's a good thing I intend to use the results only for internal comparisons of MKIV data as I am too lazy to do all that.
The obvious starting point is the observed histogram. I show my results for the Orion image
HIRA2604797, which is a rather crowded image. I have subtracted off my version of the
background and scaled the pixel values by my estimate of sigma, the measurement standard
deviation which is obtained from the probable error. The corresponding Gaussian distribution is
shown in brown and the measurements in red. The stars obviously contribute several percent to
the cumulative histogram.
The variance distribution is shown in the next diagram. The two lobes on either side of zero - they
ought to be of equal size - are the Gaussian noise and the remaining stuff at positive values is the
stars. The variance of confusing stars can be found by integrating the area under the curve up to
the value that one chooses to analyze as individual stars ... this is clearly rather arbitrary!
Another way to estimate the confusion variance is to fit a power law distribution to the star data.
This can be done using a log-log plot. The data (red) is well fitted
by the sum of the Gaussian noise (brown) and a power law (blue) k(p/σ)-ν
Now: what does it mean for the confusion variance to be, say, 0.34σ2? Unfortunately it is more complicated than just multiplying the estimated error variance for a star by 1.34. Confusion errors don't work in the same way as random measurement errors or photon noise because the confusing stars are seen through the telescope's PSF and that means that the errors are correlated from pixel to pixel. The observed variance N2 at a pixel may be modeled as arising from a distribution of independent stars, one per pixel, with variance n2. For the simple case of a Gaussian PSF with scale factor R, the mathematics shown below gives:
n2 = N24πR2
Note that this doesn't mean that n increases with the size of the PSF: it is the other way around - the measured N decreases as the size of the PSF increases while n depends on the pixel scale, not on the PSF.
This ditribution of confusing sources is then processed with a window, for example a matched Gaussian with the same scale factor. The mathematics of this can also be found below. The final response to both measuring noise and confusion is a variance for the star measurement of
σ24πR2 + n22πR2
This obviously does increase with the size of the PSF. Plugging in some numbers, say R = 1.36 pixels appropriate for image HIRA2604797 (FWHM = 3.2 pixels) and a measured confusion variance N2 = k2σ2 (yes - that's not the same k as above. There are only so many letters) gives
n2 = 23.2 k2σ2
and total variance
23.2 σ2(1 + 11.6 k2)
So a confusion variance of 0.34 σ 2 increases the variance of the star measurement not by a factor of 1.34 but by a factor of 4.9. Ouch! And the confusion variance increases with the brightness of the star you are trying to measure. The confusion error is already potentially dominant for the faintest sources ...
The simplest PSF to assume for modelling purposes is a circular Gaussian
a = 1/(2πR2)exp(-r2/(2R2))
This is very much a simplification of the actual PSF of the MKIV telescope.
Modelling the confusing stars as a distribution of independent stars, one per pixel, with variance n2, the observed variance N2 is given by
N2 = n22π integral(ra2) dr which is, mercifully, a standard integral!
= n2/(4πR2)
Note that in taking N2 and n2 as mean square values, no assumption has been made that their distributions are Gaussian but only that the second moments exist and are finite
Analysis consists of convolving the data with a window. The TASS pipeline uses a circular aperture, which I will deal with below, but the circular Gaussian window gives the simplest mathematics. Consider a circular Gaussian b of radius L:
b = C.exp(-r2/(2L2))
A source of amplitude S then gives a signal Sa which is observed through the window b as Sa*b where * denotes convolution. The calibration condition gives:
C = 1 + R2/L2 another mercifully simple result! Good things, Gaussians.
The noise variance is in general
σ22π integral(rb2) dr + n22π integral(r(a*b)2)dr where * denotes convolution.
Which, in the case of the Gaussian window gives variance
σ2πR2(R2/L2 + 2 + L2/R2) + n2πR2(1+ L2/R2)
If there is no confusion error, n2 = 0, the optimum case is Lopt = R. This is the classic least-squares case for independent random errors. The optimum variance = σ24πR2
Writing N2 = k2σ2 or n2 = 4πR2k2σ2, the optimum variance occurs at a smaller value of L given by Lopt2 = R2/(1 + 4πR2k2)1/2
For example, k2 = 0.34 and R = 1.36 gives Lopt = 0.58R and an optimum variance about 3.9 times the confusion free optimum. Continuing to use L = R is, of course, worse, giving a variance 4.9 times the confusion free optimum, as quoted above. This value of Lopt, less than 0.8 pixels, is rather small. It is hardly surprising that my first attempts at brute force optimization using my trusty spreadsheet caused the optimizer to blow up: it was attempting to converge to a delta function. These are general results:
1) For a crowded image, measurement error is dominated by confusion.
2) The optimum window size is small.
Here b = C, r < L, and zero for r > L. Straightforward integration gives the constant C = 1/(1 - exp(-L2/(2R2))).
The convolution a*b can be written:
a*b = C/R2exp(-r2/2R2) integral 0 to L(u exp(-u2/2R2) I0(ru/R2)) du
where I0 is one of the Bessel functions. The corresponding infinite integral is in my Integrals for
Dummies book but this finite one isn't so I was reduced to doing it numerically.
The confusion free noise variance is σ2πL2 /(1 - exp(-L2/(2R2)))2
Ignoring confusion, this gives an optimum near L = 1.6R that is 22.8% worse than the Gaussian case ... just 11% worse in terms of standard deviation. The Pipeline uses, I think, L = 4 pixels = 2.94R, giving variance 2.23 times the optimum Gaussian value.
To evaluate the confusion term, I was forced back to my trusty spreadsheet. For the k2 = 0.34, R = 1.36 case considered above, the optimum L is around 1.07R and the optimum variance about 4.1 times worse than the Gaussian confusion free case. This is only five percent worse than the corresponding result for the optimum Gaussian window. The actual window used by the Pipeline comes out 13.1 times worse: more than 3 times the optimum.
I believed, rightly or wrongly, that the problem of large scatter with the Gaussian window and the related model PSFs that I used in my least squares fitting exercise arose from the window not going to zero beyond a well defined finite radius. On the other hand, I felt that the circular aperture was giving problems with adjacent sources because it went to zero discontinuously. The window that I tried is
b = C(1 - (r/L)2)2 for r < L and zero for r > L.
This goes to zero with zero slope at r = L. However, the curvature is discontinuous at r = L. I chose this window over the related window b = C(1 - (r/L)2) because the latter has discontinuous slope at r = L. Integration, writing u = ½L2/R2 gives C = 1/(1 - 2/u + 2(1 - exp(-u))/u2).
The spatial spectrum, B = 2π integral(rbJ0(2πkr)) dr of this window is proportional to J3(2πkL)/(2πkL)3. The spatial spectrum of the circular aperture is proportional to J1(2πkL)/(2πkL) and that of the alternative window b = C(1 - (r/L)2) is proportional to J22(2πkL)/(2πkL)2 so these obviously form a related family. The spatial spectrum is useful when, as in this case, the convolution is tedious to calculate directly; one may instead compute the spatial spectrum of the convolution as the product of the two spatial spectra. One is then driven back to numerical computation only for the final stage of evaluation - the final integral does not appear to be in my book of Integrals for Dummies.
The confusion free optimum is somewhere around L = 2.5R. The optimum is only a bit over 1% worse than the Gaussian case. For the k2 = 0.34, R = 1.36 case considered above, the optimum L is around 1.5R and the optimum variance 3.9 times worse than the confusion free Gaussian case. This is actually better, in the next decimal place, than the Gaussian window's performance in the presence of confusion. Cutting off the response to the tail of the PSF can indeed help reduce confusion errors.
The Orion set, which formed part of DS24, is probably still the only TASS data set with the same star observed systematically in different locations on the image. This gives the analysis routines a good work out! I analyzed the Orion set a long time ago using my PSF fitting code. The results of my photometric analysis were written up as TN92: Michael Richmond processed the same data and wrote it up as TN89. I have now reprocessed the set using a fixed aperture based on the "Bennett ad-hoc" window and least-squares fitting so I now have three sets which I will call P for the standard TASS Pipeline data downloaded from the database, T for my original TN92 analysis, which used Tom's Flat, and N for my new analysis.
I used the same photometry code as for TN92 but added preprocessing to select matched pairs of measurements. The process consisted of:
Picking out the data from the three sets and converting to a common format in 3 files.
Extracting matching pairs N-P and T-P of Tycho2 stars to 2 files, analyzed separately. This eliminates stars fainter than the Tycho2 limit,
Extracting sets having a sufficient number of pairs: this has the effect of confining the analysis to the central region of the 7x7 grid of images of the Orion set where all stars are seen in many images, providing good spatial coverage.
Cleaning the data set by removing flagged measurements. This could be done selectively but the final decision was to eliminate all measurements with any kind of flag.
For the T-P set there were:| 2156 | Total stars |
| 853 | With enough pairs of V & I measurements and matching Pipeline measurements |
| 573 | After cleaning flagged measurements etc |
| comprising | 15450 paired observations |
Set N goes a little deeper and should have better error flagging. For the N-P set there were:
| 2157 | Total stars |
| 848 | With enough pairs of V & I measurements and matching Pipeline measurements |
| 618 | After cleaning flagged measurements etc |
| comprising | 16115 paired observations |
The remaining root-mean-square internal scatter after subtracting 4th order global fits and 2nd order
fits to each image as described in TN92 was
| V | I | ||
| T-P: | T | 0.0275 | 0.0194 |
| P | 0.0173 | 0.0082 | |
| N-P: | N | 0.0189 | 0.0104 |
| P | 0.0190 | 0.0099 |
So my old set T is clearly the worst on all points. My new set N is a tiny bit better in V-band but worse than the TASS Pipeline in I-band. Overall, the TASS Pipeline wins by a hair.
The differences between the two sets of Pipeline measurements result from Set N going deeper and thus having more faint stars.
The Pipeline measurements in the database have had quite large colour corrections applied. The internal scatter measurements, however, are insensitive to these colour corrections. I redid the entire analysis including 1st order colour corrections and the effect was, as expected, negligible.
Enough talk: time for the pictures.
Internal rms scatter plotted against I magnitude for the Pipeline results (black) and for the Set T PSF fitting analysis (red.) The blue lines shows the estimated measuring noise without the confusion error as calculated above except for including the source photon noise. The green lines shows the total, with the confusion error. Heavy lines are for the Pipeline data and the lighter lines are, supposedly, for data set T. Apart from the well known and, I think, still unexplained noise floor for bright stars, the black Pipeline results are close to my estimate with just a few outliers. My red Set T analysis is much worse, with a poorer noise floor and far more outliers. If you look very closely, there is just a hint that a fraction of the sources around magnitude 10.5 (ignoring the outliers) have lower rms scatter in the Set T results but not as low as the expected thin lined curve.
That the scatter mostly lies well below my estimate of confusion error tells us that the sky and the
telescope didn't change much during the few hours the observations were made.
The Set N results show an even worse noise floor for the bright stars. Apart from the outliers, there is no sign of any reduction in scatter compared to the Pipeline at any magnitude, in spite of my theoretical prediction shown as the thin blue line. On the other hand, the outliers are very much better than for Set T above and no worse than for the Pipeline results: I have made some progress in five year's work, even if half of it (the noise floor) is progress backwards!
While my analysis shows that confusion is the dominant source of measurement error for the crowded Orion images, this does not show in the results presented here. This is only a demonstration of the difference between precision and accuracy. The precision is determined by the measuring noise, ignoring confusion, because in the short term neither the sky nor the telescope change much. In the long term, the confusion error should be a much better estimator of accuracy and the errors will be much larger than the short term estimate. This is, of course, observed in the full TASS data set.
My proposed new window does about as well as the standard circular aperture used by the TASS Pipeline apart from a larger noise floor. According to my theoretical error estimates, it should do a good deal better. I don't know why it doesn't. I don't know why there is a noise floor and I don't know why Set N has a higher noise floor than the Pipeline results. A possible contributor to the noise floor is as follows. The Pipeline analysis, using a circular window which is, according to my analysis, far too large, is insensitive to PSF variations. So a possible source of the problem is the variation of the PSF across the image. The Pipeline is insensitive to the variation and produces results which agree closely with my theoretical estimates around I magnitude 10.5. Set T, which modelled the variation incompletely - I never really got my fitting program to do a good consistent job - suffered from larger than theoretical scatter and a higher noise floor. Set N, not modelling the PSF variation at all suffers the highest noise floor. If this explains the increased noise floor, it suggests that PSF modelling must be done better than anything I have achieved so far in order to demonstrate any practical advantage over aperture photometry for any system which, like the TASS telescopes (and most wide field systems) has significant variation of PSF across the image.
A large part of the reduction in outliers in Set N as compared to Set T arises from improved error flagging. Changes between the two sets include better detection of bad pixels, adding pixels with wild flat-fielding factors to hot and noisy pixels: more systematic treatment of saturated stars and of things like aeroplane streaks and generally more ruthless discarding of questionable data. To arrive at the 16,115 pairs finally analysed, well over 3,000 pairs present in the Pipeline data set were eliminated from Set N on the basis of error and warning flags. This resulted in a major reduction in outliers. Less ruthless rejection (results not shown here) produced nearly as many outliers as there are in Set T. This suggests that the Pipeline code could do a better job of flagging doubtful measurements since if those 3000 measurements had been eliminated by the Pipeline, they would not have been there to be counted by my code!
Here, I threw in the whole data set, error flags and all, rejecting only those measurements in the Bennett sets that were in only one colour: well over twice as many observations in each case as those that survived in the photometry. There are large differences between the three analyses as will be seen in the following tables. The section labelled "sd from pe" is just pe/0.6745, the standard deviation estimated from the probable error assuming a Gaussian distribution. I put this in to show that the distributions are quite far from Gaussian. The TASS Pipeline gives only V positions. Pity! The I positions are clearly better. Using only V positions presumably dates back to the bad old days when the I images had wider PSF than the V.
| Probable error | ||||
| VRA | VDEIRA | IDE | ||
| TASS Pipeline | 0.618 | 0.590 | ||
| Bennett T | 0.311 | 0.313 | 0.120 | 0.111 |
| Bennett N | 0.486 | 0.474 | 0.154 | 0.137 |
| sd from pe | ||||
| VRA | VDEIRA | IDE | ||
| TASS Pipeline | 0.916 | 0.874 | ||
| Bennett T | 0.461 | 0.464 | 0.177 | 0.165 |
| Bennett N | 0.720 | 0.703 | 0.228 | 0.204 |
| Standard Deviation | ||||
| VRA | VDEIRA | IDE | ||
| TASS Pipeline | 1.419 | 1.335 | ||
| Bennett T | 0.740 | 0.720 | 0.370 | 0.289 |
| Bennett N | 0.930 | 0.911 | 0.412 | 0.331 |
The order is now completely different from the photometry results above! The old Bennett T analysis is the clear winner. This analysis attempted to follow the variation in PSF across the image. As shown in my old note on undersampling, the sytematic error in 2-D least squares position measurement in the noise free case vanishes if the assumed PSF matches the actual one.
The new Bennett N analysis which uses the same PSF across the whole image and expressly ignores the halo, thus not matching the outer parts of the PSF at all, comes next.
The TASS Pipeline positions are much worse than either of the Bennett analyses. This is not a new result: I pointed it out back in TN73 (hidden away: I append a copy) and even earlier, in 2000, in the note on undersampling. 2-D least squares fitting is simply a superior method of estimating position compared to the 1-D histogram method used in the Pipeline.
Better, that is, with the existing MKIV. It would, of course, be nice to have lenses that came closer to their design specs, faster image readout, more perfect pointing accuracy and go-to capability. Maybe for the MKV.
By far the largest (known) source of short term measuring noise is the sky background. Those TASS MKIVs that have been set up in darker sites ought to show much lower scatter for the fainter sources. Is this so? I have not seen any technical information on this on the list. Meanwhile, in existing bright sky sites the theoretical minimum sky background noise variance is directly proportional to the area of the PSF so the sharpest possible focus is desirable. This improvement has been concealed in existing analyses by the use of relatively large circular apertures for the photometry. I suspect that little attention has been paid to getting the best out of the existing lenses since Tom did all that work rebuilding them 'way back when. By sheer dogged persistence, he coaxed respectable performance out of them that was stupendously good compared to their disastrous performance as delivered but this improved performance is still a long way from that originally designed in. How difficult would it be to replace the lenses? I suspect that compared to the rest of the system the lenses look pretty cheap!
The sharpest possible focus also reduces the effect of confusion. Unless, of course, you continue to use the same large circular aperture for your photometry.
The noise floor is a tantalising problem. The Pipeline data in the TASS database, measured with a large circular aperture, shows a consistently better noise floor (here and in, for example TN94) than my analysis using the supposedly superior least squares fitting. The Pipeline method is the least affected by PSF variations and my Set N the most. This correlates with the increase in noise floor. This correlation strongly suggests that the noise floor is a measure of the degree to which the analysis fails to eliminate the effect of PSF variation. If the noise floor depends on the processing, it cannot be entirely an inherent feature of the data and one might manage to do better. If it turns out that what we need is better estimates of the error as a function of position within the image, the current "patches" observing strategy is a severe limitation as it provides an absolute minimum duplication of measurements of the same star at different image positions. The Orion set that I analyse here is, I think, the only TASS data set that systematically provided a grid of observations and this set is stretched to the limit to provide the 4th order spatial fits I used here.
In any case, 2-D least squares fitting gives positions that are clearly more accurate than 1-D histograms. This is, I think, something that could be incorporated in the Pipeline though it would be a lot of work. My code does a simultaneous fit of position and amplitude using the Simplex/Amoeba search and it is not obvious that this could be used just for position ... one could always throw away the amplitude and replace it with the aperture sum ... maybe an error flag could usefully be set if the least squares amplitude differed too much from the aperture sum. Lots of things to explore!