I just realised the code for my algorithm was slowed by a lot of unnecessary work for this purpose (like creating error maps), but you seem to have things in hand with this 
I really want us to get the matlab compiler . . .
I don't know anything about the matlab compiler but I assume it must be pretty hardcore (given that matlabs been around for ages) so if you can persuade the
torrent mobile powers that be to get a copy it'll provided quite a performance boost. It's been a while since I last looked at interpreters but from my basic days I'd imagine you could get around hundred of times speed increase (without knowledge of the entire program an interpreter just can't make some optimisations and it still has to do the lexical analysis - ouch).
But I digress, the best optimisations aren't made by thowing better hardware or new technology at a problem but by doing less. Which is to say by changing or refining the algorithm (I know, I'm lecturing but bear with me). The change that'd give the biggest performance win to the star blurring code would be to only run the X and Y loops over the area around the star to which the Gaussian distribution contributes significantly. ie: where the computed brightness is greater than the cut off.
A different (and more complicated but much faster) optimisation is to pre-compute the gaussian for a single point. Then plot the stars but instead of drawing a point draw the pre-computed gaussian image instead.
This can be extended to an image where the positions of the stars aren't known - where all that's available is the image with bright points plotted on it. In this case loop X and Y and if the current point is black: do nothing but if the current point is bright then draw the pre-computed gaussian image.
This can be extended even further to an image with points of varied brightness by multiplying the pre-computed gaussian image by the brightness of the point. This isn't a good approximation of the gaussian for a dim point but this can be 'fixed' by pre-computing a range of gaussian blurred images from points of different brightnesses.
Another thing you might like to try is kind of averaging the images you get for the different smoothing scales, so you can have broad large scale things that still have a fair bit of detail to them. It might also be possible to recapture the dustiness my algorithm was giving by randomly pushing the values around somehow after the smoothing (multiplying pixels by a random number between 0.9 and 1.1, say) - it'd have to be done with at least 500x500 resolution I imagine.
I'm really hoping that the test images I posted look crap (ie: not dusty) because I didn't include the star spoke generation code - I haven't checked but I strongly suspect that's it. Unfortunately (and again this is just a suspicion) I think that multiplying pixels by a random number will make it look grainy rather than dusty. I'll give it a bash though. And on that subject I haven't got any further on implement your code in C++ because (for better or worse) I got sucked back into the actual code of SC Redux. I was avoiding a couple of particularly nasty problems but I have them solved now so I'm on a roll again.
If you'd like the optimised C++ code (ie: if it'd be useful to your studies or some'at) I'll quite happily work on it sooner rather than later as I'm going to convert it at some point.