# Help with: Benford's Law Second Digit Test (have .ado for First Digit Test)

#### Stockholm

##### New Member
Hi all,

I'd like to perform Benford tests on a dataset using STATA and would much appreciate any help with this.

To perform the 'basic' Benford test on first digits I found a nice .ado which will do exactly this for me (link).

Does anyone have any input how I could extend the analysis to other digits, most importantly the second digit in each observation? For reference, Wikipedia has a quick briefing of the calculation (link). I imagine it should be possible to create an updated .ado which will perform the second digit test without too much pain, based on the firstdigit ado. Any help would be great.

Thanks

Last edited:

#### Dragan

##### Super Moderator
Hi all,

Does anyone have any input how I could extend the analysis to other digits, most importantly the second digit in each observation?
I have done this using the bootstrap. More specifically, we know what the mean, variance, skew, and kurtosis are for the second digits ---assuming the data follow Benford's Law. They are mean=4.1873, variance=8.25381, skew=0.133114, and kurtosis=-1.20839. Simply bootstrap these descriptive statistics and obtain confidence intervals. I've done this using S+ ---Stata, however.

#### bukharin

##### RoboStataRaptor
This is a tweak of Nick Cox's -firstdigit-. I have not tested it extensively but it seems to work. It would be worth validating it if you have an example from elsewhere. You could save this as seconddigit.ado and put it in your personal ADO folder (if in Windows, C:\ado\personal - otherwise run -sysdir- and Stata will tell you)

Code:
* seconddigit
* slight tweak of Nick Cox's -firstdigit- command from SSC
program seconddigit, rclass byable(recall)
version 9.1
syntax varlist(numeric) [if] [in] ///
[, BY(varname) ALLobs MISSing PERcent ]

quietly {
// what observations to use
if "allobs'" != "" marksample touse, novarlist
else marksample touse

if "by'" != "" & "missing'" == "" markout touse' by', strok

count if touse'
if r(N) == 0 error 2000

// variable(s) or group(s)
local nv : word count varlist'

// by option
if "by'" != "" {
if nv' > 1 {
di as err ///
"by() cannot be combined with nv' variables"
exit 198
}

tempname stub
separate varlist' if touse', ///
by(by') g(stub') veryshortlabel missing'
local varlist "r(varlist)'"
local nv : word count varlist'
}

// display preparation and initiation
local nam = 0
foreach v of local varlist {
if "by'" != "" {
local nam = ///
max(nam', length(": variable label v''"'))
}
else local nam = max(nam', length("v'"))
}
local col = min(28, nam') + 1

local txt ///
"       n   chi-sq.  P-value   digit   observed   expected"
noi di _n as txt "{col col'}txt'"

local line = col' - 1 + length("txt'")
noi di "{hline line'}"

tempvar thisuse
gen byte thisuse' = .
local ofmt = cond("percent'" != "", "%11.2f", "%11.0f")

// loop over variables (-separate- results if -by()-)
foreach v of local varlist {
local which = ///
cond("by'" != "", ": variable label v''"', "v'")
local which = abbrev("which'"', col' - 1)
replace thisuse' = touse' & v' < .

mata : sd_work("v'", "thisuse'", "percent'")

noi di as text "which'{col col'}"'        "sp'" ///
as res %8.0f r(N)                            ///
as res %10.2f r(chisq)                       ///
as res %9.4f r(p)                            ///
as res "       0" ofmt' r(obs0) %11.2f r(exp0)
return scalar obs0 = r(obs0)
return scalar exp0 = r(exp0)

noi forval i = 1/9 {
di as res "{col col'}sp'{space 34}i'"     ///
ofmt' r(obsi') %11.2f r(expi')
return scalar obsi' = r(obsi')
return scalar expi' = r(expi')
}

return scalar p = r(p)
return scalar chisq = r(chisq)
return scalar N = r(N)

local --nv
if nv' noi di " "
}
}
end

mata :

void sd_work(string scalar varname,
string scalar tousename,
string scalar percent)
{
real colvector y, obs, exp
real scalar n, i, chisq
string scalar name

y = st_data(., varname, tousename)
n = rows(y)
exp = obs = J(10, 1, 0)

y = strtoreal(substr(strofreal(y), 2, 1))

for (i = 0; i <= 9; i++) {
j = i+1
obs[j] = colsum(y :== i)
exp[j] = n * (log10(1 + 1/(10+i)) + log10(1 + 1/(20+i)) + ///
log10(1 + 1/(30+i)) + log10(1 + 1/(40+i)) + log10(1 + 1/(50+i)) ///
+ log10(1 + 1/(60+i)) + log10(1 + 1/(70+i)) + log10(1 + 1/(80+i)) ///
+ log10(1 + 1/(90+i)))
name = "r(obs" + strofreal(i) + ")"
st_numscalar(name,
percent == "" ? obs[j] : 100 * obs[j] / n)
name = "r(exp" + strofreal(i) + ")"
st_numscalar(name,
percent == "" ? exp[j] : 100 * (log10(1 + 1/(10+i)) + log10(1 + 1/(20+i)) + ///
log10(1 + 1/(30+i)) + log10(1 + 1/(40+i)) + log10(1 + 1/(50+i)) ///
+ log10(1 + 1/(60+i)) + log10(1 + 1/(70+i)) + log10(1 + 1/(80+i)) ///
+ log10(1 + 1/(90+i))))
}

chisq = colsum(((obs - exp):^2) :/ exp)
st_numscalar("r(p)", chi2tail(9, chisq))
st_numscalar("r(chisq)", chisq)
st_numscalar("r(N)", n)
}

end

#### bukharin

##### RoboStataRaptor
It would be worth validating it
I am not sure what to do about numbers <10 - do they get counted in the denominator? Probably not. The above program does count them, so you'd need to run it as:
seconddigit myvar if myvar>9

#### Stockholm

##### New Member
Thank you both for your replies, they're both helpful. I'll test the .ado tomorrow and report back with the results. I believe I'll have the chance to test it against a previous example. Cheers!

#### Stockholm

##### New Member
Hi again all,

Firstly, thanks a lot for the help with the code. I've been trying it out and it looks very interesting. One question though. When running the second digit test I noticed that the sum of the obsereved digits do not match the sample size (i.e. the 'n' is different from the total I get if I add up the obsereved digits 0-9 in the 'Observed' column...). Any help on explaining this discreprancy, or with 'debugging' the code would be much appreciated. My initial thoughts are that it has something to do with when the variable 'y' is defined, but I'm not sure...

Thanks

#### bukharin

##### RoboStataRaptor
Yes, this is what I was getting at above when I mentioned problems with the denominator.

Try replacing line 63:
Code:
replace thisuse' = touse' & v' < .
with
Code:
replace thisuse' = touse' & !missing(substr(string(v'), 2, 1))