Tuesday, December 20, 2011

Stata Code Snippets: Finding Duplicates in Observations

Here are some Stata code snippets for checking duplicates in your panel data. (I have to figure out how to do this because my panel dataset have duplicates, and therefore I cannot do a tsset on the data) The command to use is duplicates.

To list any duplicate observations, run:

duplicates list

For the duplication issue I have with tssset, I specific the two relevant panel variables

duplicates list termtype dateIndex

Now, to create a dummy variable isdup to mark the duplicates

duplicates tag termtype dateIndex, gen(isdup)

In my case, I just manually remove the duplicates using the graphical editor:

edit if isdup

Or, to just remove it using the command

duplicates drop termtype dateIndex

An example:

sysuse auto
keep mpg rep78
duplicates list
duplicates list rep78
duplicates tag rep78, gen(isdup)
list if isdup