EBCIC: Exact Binomial Confidence Interval Calculator
These programs are mainly for researchers, developers, and designers who calculate Binomial Confidence Intervals.
EBCIC calculates binomial intervals exactly, i.e. by implementing Clopper-Pearson interval [CP34] without simplifying mathematical equations that may deteriorate intervals for certain combinations of parameters. EBCIC can also shows graphs for comparing exact intervals with approximated ones.
Run the following initial cells:
# Run this cell, if `ebcic` package has not been installed yet:
%pip install ebcic
import ebcic
from ebcic import *
Installation
When using PyPI ebcic package:
pip install ebcic
When using github ebcic
repo:
git clone https://github.com/KazKobara/ebcic.git
cd ebcic
Command-line help
Check the version and options:
python -m ebcic -h
Cf. the examples below.
ebcic
package)ebcic
package according to this page.NOTE: If you manage the edited file with git, save it as a MATLAB code file (*.m) file to commit (or commit the live code file (*.mlx) to a git LFS (Large File Storage)) since live code files (*.mlx) are not git friendly. If necessary, save it as a *.html file as well to check its look.
To print exact intervals as text.
python -m ebcic -k 0 -n 100 -c 95 -u
- For
k=0
ork=n
, give-c
option one-sided confidence percentage.- v0.0.4 or newer returns the same value as the above result by setting
--rej-perc-lower
(-r
) option the percentage of the lower rejection area in assuming population as follows:
python -m ebcic -k 0 -n 100 --rej-perc-lower 5 -u
python -m ebcic -k 1 -n 100 -c 95 -lu
- For
0<k<n
, give-c
option two-sided confidence percentage.- v0.0.4 or newer returns the same value as the above result by setting both
--rej-perc-lower
(-r
) and--rej-perc-upper
(-s
) options equally divided percentages of assuming population as follows:
python -m ebcic -k 1 -n 100 -r 2.5 -s 2.5 -lu
python -m ebcic -k 1 -n 100 -r 5 -u
- For v0.0.4 and newer, set
--rej-perc-lower
(-r
) option the percentage of the lower rejection area in assuming population.- For v0.0.3 and older and
0<k<n
, give-c
option2*s-100
as follows wheres
is the one-sided confidence percentage (in this cases=95
and2*s-100=2*95-100=90
).
python -m ebcic -k 1 -n 100 -c 90 -u
Giving
-c 90
is the same as giving--alpha 0.1
(or-a 0.1
).
python -m ebcic -k 1 -n 100 --alpha 0.1 -u
python -m ebcic -k 99 -n 100 -s 5 -l
- For v0.0.4 and newer, set
--rej-perc-upper
(-s
) option the percentage of the upper rejection area in assuming population.- For v0.0.3 and older and
0<k<n
, the equivalent value is obtained in the same way as the previous example using-c
or-a
option as follows:
python -m ebcic -k 99 -n 100 -c 90 -l
python -m ebcic -k 99 -n 100 -a 0.1 -l
Edit the following parameters, k
, n
, and confi_perc
(or rej_perc_lower
and rej_perc_upper
), and then run the cell.
print_interval(Params(
k=1, # Number of errors
n=501255, # Number of trials
confi_perc=99.0 # Confidence percentage
))
where confi_perc
is set as follows:
k=0
or k=n
:
0<k<n
:
0<k<n
and to get one-sided confidence interval:
-c
option 2*s-100
where s
is the one-sided confidence percentage.Result:
===== Exact interval of p with 99.0 [%] two-sided (or 99.5 [%] one-sided) confidence =====
Upper : 1.482295806e-05
Lower : 9.99998e-09
Width : 1.481295808e-05
For v0.0.4 and newer, instead of confi_perc
or alpha
, Params() can set the confidence with
either or both of rej_perc_lower
and rej_perc_upper
in percentage of 0 <= x < 50
(or either or both of rej_lower
and rej_upper
in ratio of 0 <= x < 0.5
. Params()’s class functions are also available.
Params(
k=1, # Number of errors
n=501255, # Number of trials
# Rejection area in percentage
rej_perc_lower=0.5 # Lower rejection area (to get upper interval)
rej_perc_upper=0.5 # Upper rejection area (to get lower interval)
).print_interval()
Note that it uses the lower rejection area to get the upper confidence interval and vice versa.
Result:
===== Exact interval of p with rejection area of lower 0.5 [%] and upper 0.5 [%] =====
Upper : 1.482295806e-05
Lower : 9.99998e-09
Width : 1.481295808e-05
This program can show not only the typical 95% and 99% confidence lines but also any confidence percentage lines.
Python Interpreter or Jupyter cell to run:
interval_graph(GraProps(
# Set the range of k with k_*
k_start=1, # >= 0
k_end=1, # >= k_start
k_step=1, # >= 1
# Edit the list of confidence percentages to depict, [confi_perc, ...],
# for two-sided of 0<k<n where 0 < confi_perc < 100, or
# for one-sided of k=0 or k=n.
# NOTE For one-sided of 0<k<n, set
# confi_perc=(2 * confi_perc_for_one_sided - 100)
# where 50 < confi_perc_for_one_sided < 100
# (though both lower and upper intervals are shown).
confi_perc_list=[90, 95, 99, 99.9, 99.99],
# Lines to depict
line_list=[
'with_exact',
'with_line_kn', # Line of k/n
],
# savefig=True, # uncomment on Python Interpreter
# fig_file_name='intervals.png',
))
Result:
If figures or links are not shown appropriately, visit github.io page or github page.
Python Interpreter or Jupyter cell to run:
interval_graph(GraProps(
k_start=0, # >= 0
k_end=5, # >= k_start
line_list=['with_exact'],
# savefig=True, # uncomment on Python Interpreter
# fig_file_name='intervals.png',
))
Result:
Python Interpreter or Jupyter cell to run:
interval_graph(GraProps(
k_start=0, # >= 0
k_end=0, # >= k_start
log_n_end=3, # max(n) = k_end*10**log_n_end
line_list=[
'with_exact',
'with_rule_of_la', # rule of -ln(alpha)
# available only for k=0 and k=n
#'with_normal', # not available for k=0 and k=n
'with_wilson',
'with_wilson_cc',
'with_beta_approx',
],
# savefig=True, # uncomment on Python Interpreter
# fig_file_name='intervals.png',
))
where interval names to be added in the line_list
and their conditions are as follows:
Interval name (after ‘with_’) | Explanation | Condition |
---|---|---|
exact | Implementation of Clopper-Pearson interval [CP34] without approximation. | |
rule_of_la | ‘Rule of -ln(a) ’ or ‘Rule of -log_e(alpha) ’; Generalization of the ‘Rule of three ’ [Lou81,HL83,JL97,Way00,ISO/IEC19795-1] that is for k=0 and alpha=0.05 (95% confidence percentage), to other confidence percentages than 95% and k=n . |
k=0 or k=n |
wilson | Wilson score interval [Wil27]. |
|
wilson_cc | Wilson score interval with continuity correction [New98]. |
|
beta_approx | Approximated interval using beta distribution. | |
normal | Normal approximation interval or Wald confidence interval . |
0<k<n |
Result:
As you can see from the following figure, ‘rule of -ln(a)
’ for large n
and ‘beta_approx
’ are good approximations for k=0
.
For
k=0
, interval_graph() of EBCIC v0.0.3 and newer, display only one-sided upper intervals since their lower intervals must be0
(though some approximations, such as ‘Wilson cc
’, output wrong values than0
).
k=1
Python Interpreter or Jupyter cell to run:
interval_graph(GraProps(
k_start=1, # >= 0
k_end=1, # >= k_start
line_list=[
'with_line_kn'
# 'with_rule_of_la', # available only for k=0
'with_exact',
'with_normal',
'with_wilson',
'with_wilson_cc',
'with_beta_approx',
],
# savefig=True, # uncomment on Python Interpreter
# fig_file_name='intervals.png',
))
Result:
As you can see from the following figures and as warned in many papers such as [BLC01], normal-approximation intervals are not good approximations for small k
.
The upper intervals of the other approximations look tight.
The approximation using beta distribution looks tight where the confidence interval for k=n=1
is one-sided.
k=10
Python Interpreter or Jupyter cell to run:
interval_graph(GraProps(
k_start=10, # >= 0
k_end=10, # >= k_start
log_n_end=2, # max(n) = k_end*10**log_n_end
line_list=[
'with_exact',
'with_normal',
'with_wilson',
'with_wilson_cc',
'with_beta_approx',
],
# savefig=True, # uncomment on Python Interpreter
# fig_file_name='intervals.png',
))
Result:
For k=10
, ‘normal
’ still does not provide a good approximation.
k=100
Python Interpreter or Jupyter cell to run:
interval_graph(GraProps(
k_start=100, # >= 0
k_end=100, # >= k_start
log_n_end=2, # max(n) = k_end*10**log_n_end
line_list=[
'with_exact',
'with_normal',
'with_wilson',
'with_wilson_cc',
'with_beta_approx',
],
# savefig=True, # uncomment on Python Interpreter
# fig_file_name='intervals.png',
))
Result:
At least for k=100
and confidence percentage, confi_perc=99.0
, all these approximations look tight.
Download
git clone https://github.com/KazKobara/ebcic.git
Open the following file with your browser (after replacing <path to the downloaded ebcic>
appropriately):
file://<path to the downloaded ebcic>/docs/_build/index.html
For WSL Ubuntu-20.04, replace <username>
and <path to the downloaded ebcic>
appropriately:
file://wsl%24/Ubuntu-20.04/home/<username>/<path to the downloaded ebcic>/docs/_build/index.html
[CP34] Clopper, C. and Pearson, E.S. “The use of confidence or fiducial limits illustrated in the case of the binomial,” Biometrika. 26 (4): pp.404-413, 1934
[Lou81] Louis, T.A. “Confidence intervals for a binomial parameter after observing no successes,” The American Statistician, 35(3), p.154, 1981
[HL83] Hanley, J.A. and Lippman-Hand, A. “If nothing goes wrong, is everything all right? Interpreting zero numerators,” Journal of the American Medical Association, 249(13), pp.1743-1745, 1983
[JL97] Jovanovic, B.D. and Levy, P.S. “A look at the rule of three,” The American Statistician, 51(2), pp.137-139, 1997
[Way00] Wayman, J.L. “Technical testing and evaluation of biometric identification devices,” Biometrics: Personal identification in networked society, edited by A.K. Jain, et al., Kluwer, pp.345-368, 2000
[ISO/IEC19795-1] ISO/IEC 19795-1, “Information technology-Biometric performance testing and reporting-Part 1: Principles and framework”
[New98] Newcombe, R.G. “Two-sided confidence intervals for the single proportion: comparison of seven methods,” Statistics in Medicine. 17 (8): pp.857-872, 1998
[Wil27] Wilson, E.B. “Probable inference, the law of succession, and statistical inference,” Journal of the American Statistical Association. 22 (158): pp.209-212, 1927
[BLC01] Brown, L.D., Cai, T.T. and DasGupta, A. “Interval Estimation for a Binomial Proportion,” Statistical Science. 16 (2): pp. 101-133, 2001
When you use or publish the confidence interval obtained with the software, please refer to the software name, version, platform, and so on, so that readers can verify the correctness and reproducibility of the interval with the input parameters.
An example of the reference is:
The confidence interval is obtained by EBCIC X.X.X on Python 3."
where X.X.X is the version of EBCIC.
The initial software is based on results obtained from a project, JPNP16007, commissioned by the New Energy and Industrial Technology Development Organization (NEDO).
Copyright (c) 2020-2022 National Institute of Advanced Industrial Science and Technology (AIST)