A while back (before the financial crisis), there was a lot of discussion on income inequality on MR which cited the paper Income Inequality in the United States: 1913-1998. After having read it (finally!) - with annual updates available from Prof. Saez's website - I was still trying to grasp how the authors made their calculations. It would have been great if they had made the raw data and computations available.
My first impression had been that they had access to all IRS tax returns which would have made this the definitive work on income inequality. But they did not and made inferences based on aggregate data which leaves room for quibbles.
"Using the published information on composition of income by brackets and a simple linear interpolation method, we decompose the amount of income for each fractile into five components: salaries and wages, dividends, interest income, rents and royalties, and business income. We use the same methodology to compute top wage shares using published tables classifying tax returns by size of salaries and wages."
It is sometimes said that regressions is what we do when we don't know what else to do. The same could be said for linear interpolation.