Thursday, November 17, 2011

Stata Code Snippets: Summarize Statistics by Category

I calculate summary statistics by categories in SAS a lot - like finding the annual average of a variable by security type. Here are some code snippets for doing that in Stata.

The command to use is collapse. For example, the following code calculates the average mpg and average weight between foreign and domestic cars from the AUTO dataset.

    sysuse auto
    collapse(mean) mpg (mean)weight, by(foreign)

Unlike SAS, which allows you to output the summary data to a different dataset, Stata alters the dataset.

To make a prettier table with summary data, use the tabdisp command:

tabdisp foreign, cell(mpg weight) format(%9.2f)

If you run the code, you will find that, as expected, foreign automobiles are lighter and have higher mpg in the late 70s according to the data.