Wednesday, September 7, 2016

Formatting regression results from Stata to Excel -- Escaping the Wildcard Character [Data]


Most statistical packages uses "*" to represent statistical significance. The following code:
sysuse auto
reg mpg weight if foreign == 0
est store foreign
reg mpg weight if foreign == 1
est store domestic
est table foreign domestic, star(.1 .05 .01) stats(N r2) var(32)
est drop *

generates the following output:
------------------------------------------------------------------
                       Variable |    foreign        domestic     
---------------------------------+--------------------------------
                         weight | -.00597508***   -.01042596***  
                          _cons |  39.646965***    48.918297***  
---------------------------------+--------------------------------
                              N |         52              22     
                             r2 |  .76727555       .46628963     
------------------------------------------------------------------

Pasting the result lines into Excel, we have:

We can split the text by "space" to end up with 4 columns of data (my laptop only has Chinese Excel, but the dialog boxes looks the same and the options are located in the same places for the most part):

After splitting the lines by space, we have:

For illustration, I'm going to change the 4 cells to have different number of "*". Now, to generate a version without the ***, you can do the following:

To conditional format depending on the number of "*", we need to escape the "*" character because "*" by default is a special character that means all characters. So if you just use *, all cells are formatted:

If you use "~*", Excel now conditional formats all cells containing "*".

If you do "~*~*", Excel now conditional formats all cells containing two "*"s.