From my own point of view, and speaking for myself only:
I have been building speakers for a really long time now. I own all of the equipment I need, I have the cabinetry and finishing skills, I have the crossover design software, calibrated microphones, etc.
I had quite a large cash-outlay for all of the above equipment, but they have paid for themselves over the years. When I purchase tools, I try to buy the best I can afford because I know it will last me long, meaning it will most probably outlast the time it takes for the tool to pay for itself. Even if it does not, I can usually get parts for these tools so a fixing costs less than a new tool.
My current situation then is such that each set of speakers I build these days are comprised of an investment in drivers, crossover components, wood, finishing, consumables, and time only. This enables me to build "cheaper" than some others who need to invest in tools.
Hypothetically speaking, if I had to build a stereo set of speakers with a R 25k budget (for myself, thus not counting time), I am confident I will be able to blow a R 100k set out of the water. They will look the part, sound the part, and weigh the part.
Now, if I didn't have the tools and equipment the input cost may be double, which still gives a 2:1 ROI.
Another thing to consider is that there is some "black magic" in designing a good speaker. You can using the best of everything, do everything right down to the last spec of polish on the plinth, and the speaker could still sound bad. For me then, this final bit of the puzzle takes quite a bit of time. It is the "voicing" of the speaker. I have been known to try up to 6 different crossover implementations, countless resistor padding combinations, countless combinations of cabinet daming to get this right. In my opinion, this is the line that seperates a passable speaker from a really good one.
When you do get everything to work together, it is as if the sound "snaps into place". This usually happens quite unexpectedly: I will be sitting on the carpet with the crossover in front of me, playing around with my crocodile clip leads and resistors and caps all around me while listening to my reference tunes. All of a sudden you clip that lead onto the network and everything comes alive: You have coherency between the drivers, the imaging is spot-on, the midrange is well defined and clean, everyting us just right.
The other side of the coin is that you can do amazing things without using the best of the best. I have come to realize that the voicing and crossover is probably the most imoprtant aspect of a well designed speaker. My Litil kit speaker is a point in case - the drivers are small, don't cost much, and are not all that impressive on paper. The implementation is however quite startling.
Regards,
Ian.