How to Plan and Extract a Classical Variable Sample with ADA
Introduction
ADA (Audit Data Assistant) allows you to plan and extract classical variable (or just variable) samples. Also called Stratified Random Sampling (SRS), Classical Variable Sampling (CVS), like Monetary Unit Sampling (MUS), is used to project monetary misstatement for a population based on a sample. CVS is a high error rate population application and is usually not the best choice for low error rate populations where the error occurrence rate is less than 10%. MUS is typically a better choice in low error rate circumstances; CVS can still be used with the possibility of generating narrower confidence intervals than MUS, but it requires very large sample sizes. Unlike Attribute Sampling where an allornothing yes/no is supplied when auditing the sample, with monetary misstatement sampling like CVS the degree of error – or amount of error difference – is taken into account when projecting results. ADA’s Classical Variable Sampling Planning and Evaluation routines implement the methodologies laid out in Statistical Auditing (1977) by Donald M. Roberts for variable sampling to create the most accurate and faithful rendering possible.
CVS requires larger sample sizes than MUS because variable sampling seeks to measure how much something (in this case monetary misstatement error) varies. It does this by seeking to calculate accurate standard deviations to measure this variability. The only way to do this with a sample is to observe a lot of errors. To help the user ensure this occurs, ADA includes an Error Assurance component within the CVS module. With ADA’s CVS Sample Planning module, the user can plan a statistically valid sample size by setting the desired precision and error assurance parameters. The module will stratify the population for the sample using the standard cumulative square root of the frequency method and allocate the sample size to the strata using optimum Neyman allocation. The module can then be used to extract a sample. The sample is extracted from the population file, which is, or should be, the file open in ADA when the module is executed. With optimum Neyman allocation, larger valued audit units are more likely to be selected for the sample than smaller audit units but this effect is not nearly as great as it is with PPS sampling using with MUS.
If you plan to extract a sample, the best practice is to open the population file first in ADA and then run Classical Variable > Plan and Extract. If a population file is not already open, ADA will prompt you to choose one. Then, after entering the General Planning and Precision and Error Assurance Planning inputs, click Plan to populate the Stratification and Sample Size Allocation section and the Design Issues and Information section. Once satisfied with the resulting stratification and sample size allocations (if not, change the inputs and click Plan again), click Extract to obtain the sample.
Note: If, after clicking Plan, the user changes the inputs or chooses a new column and then clicks Extract without clicking Plan first, then the program will replan the sample size and restratify based on the new inputs and immediately extract the sample. Thus, it is a best practice to click Plan after changing any inputs. That way there are no surprises.
After extracting a sample, the user is expected to export the sample and the high/low values file(s) (if applicable) to Excel. Then, after auditing the sample and high/low values file(s) for correctness, the user will import the audited file(s) back into ADA and perform CVS Evaluation.
Formats Supported by CVS Sample Planning
ADA’s CVS Sample Planning utility allows you to sample from Parquet data. Parquet is the format ADA uses for storing data files.
Using the Dialog
The dialog box consists of four sections: General Planning, Precision and Error Assurance Planning, Stratification and Sample Size Allocation, and Design Issues and Information. The General Planning inputs consist of column choice, high/low value handling, number of strata, the random seed number, sample minimums and the output filename for the sample. In the Precision Planning section, the user inputs the confidence level and the materiality/tolerable error to plan one sample size. In the Error Assurance section, the user inputs their best guess of the population error rate and the minimum number of errors to plan a second sample size. The larger size wins and gets allocated to the strata. The last two sections are populated by hitting Plan. The Design Issues and Information section reports whether getting the minimum errors (Error Assurance) or precision objectives (Precision Planning) is responsible for the overall sample size. It also discusses other issues that could pose a risk for the sample. The Stratification and Sample Size Allocation section breaks down the stratification and with the strata statistics.
General Planning. General inputs for the sampling plan.
Recorded Amount Values to Use. In this section, select the numeric column to use from the dropdown list. Choose the Positive, Negative or All option. The Positive option subsets the provided data down to only the values greater than zero to define the population items. The Negative option defines the population to be the subset of items less than zero. The All option also includes all items.
Total Population Value. The value of the sampled population. Updates when changes occur to the Recorded Amount Values to Use and with the High and Low Value handling as the Total Population Value excludes any items that qualify as High or Low value items. If using the Tolerable Error Percent option, the percentage is applied to this Total Population Value number.
Sample High Values 100%. Check the box to exclude high values from the sample and have them extracted to a _HV file to be audited 100%. Once checked, supply the threshold value.
Sample Low Values 100%. Check the box to exclude low values from the sample and have them extracted to a _LV file to be audited 100%. Once checked, supply the threshold value. This is especially useful for populations with negative amounts.
Number of Strata. Defines the number of strata that will be defined after one clicks Plan. By changing this number and clicking Plan again, one can see the effect of different strata numbers.
Random Seed Number. The seed number for starting the random number generator. Using the same Random Seed Number along with the same population data file inputs will allow the user to replicate a prior sample.
Minimum Overall Sample Size. This setting ensures the total sample size will be at least as large as this amount.
Minimum Sample Size Per Stratum. This setting ensures that each stratum in the sample will be at least as large as this amount.
Output Filename. Name your sample data file. It will be saved in the project folder with a .parquet extension. Here, the output file will be named Demo CVS Sample.parquet.
Precision and Error Assurance Planning. This has two subsections: 1) Precision Planning and 2) Error Assurance.
Precision Planning. The subsection for defining the confidence level and desired precision.
Confidence Level. Defines the Beta risk for the sampling plan, which is the risk of not finding material error with the sample when it exists in the population. The Beta risk is the complement of the Confidence Level (i.e. 100% – C.L.%). The confidence level defines the expected likelihood that the conclusions drawn with the sample will be accurate.
TwoSided. Plans the sample size with the expectation that the sample will be evaluated using the TwoSided option, which creates upper and lower confidence interval limits. The Beta risk is split equally between the upper and lower sides of the confidence interval.
Upper. Plans the sample size with the expectation that the sample will be evaluated using the Upper option, which creates a onesided upper confidence interval limit. The Beta risk is assigned completely to the upper side of the confidence interval.
Lower. Plans the sample size with the expectation that the sample will be evaluated using the Lower option, which creates a onesided lower confidence interval limit. The Beta risk is assigned completely to the lower side of the confidence interval.
Tolerable Error. The user’s entry of how much error they are willing to tolerate in the population before making recommendations for substantive changes to controls. This is usually called the auditor’s materiality or performance materiality and represents an upper bound. This amount is set as the desired precision for planning the sample size. The desired precision is the desired distance between the point estimate (in the middle) and an upper or lower limit above or below the point estimate. This makes the sample precise enough to test the materiality threshold. If the upper or lower limit of the confidence interval is less than the desired precision/materiality when the sample is evaluated, then the error in the population is considered immaterial. Otherwise, the error is material.
Percent. The user can enter the error rate as a percentage of the total population value. Please note that the Percent entry calculates based on the population value excluding high and low value items (if defined). The amount achieved by multiplying this percent by the Total Population Value is the value used in the sample size calculation formula. If one wishes to set the materiality/desired precision based on the population value with high and low values included, then one can easily calculate that by hand and enter the value using the Amount option.
Amount. The user can enter the total amount of dollar error in the population to set the materiality threshold.
For a population of size $1,000,000, a tolerable population error rate of 10% corresponds to $100,000 worth of error in the population. Either entry will yield the same answer.
Error Assurance. This section applies attribute sampling error assurance calculations to calculate a sample size that helps to ensure enough errors are found in the sample. For more information, see the help section for Attribute Sampling Error Assurance.
Population Error Occurrence Rate. The user’s best guess as to the rate at which errors occur in the population. While the Tolerable Error represents a materiality threshold or goal to hopefully achieve with the sample, this rate represents the anticipated frequency of error occurrence when sampling from the population. The degree, or magnitude, of error is not a consideration here. The best practice is to enter the lowest conceivable error rate (i.e. conservative). However, lower rates increase the sample size.
Desired Minimum Number of Sample Errors. Defines the desired minimum number of errors for the whole sample. The number of errors, or differences, required to get a good estimate of the achieved precision, which defines confidence interval width, depends on both the proportion of error occurrence and the magnitude of the errors. The most favorable situation occurs when there are several (20+) errors observed within each stratum and larger errors occur in the strata with larger values. Given this ideal situation, the user may want to aim for 510 errors per stratum as a suggested starting point.
Confidence Level for Achieving Minimum Errors. The likelihood of getting the minimum number of errors or more in the sample. If 80% is used, for example, there is an 80% chance of achieving the minimum number of errors or more errors and a 20% chance of falling short of the minimum assuming the population error occurrence rate is accurate.
Stratification and Sample Size Allocation. This section presents the statistics for each stratum after stratifying by clicking Plan. The Boundary column value represents the exclusive upper bound for that stratum, meaning the values in the stratum are less than (but not equal to) the listed value. For example, the Boundary value in the screenshot for Stratum 1 is 39,240.00. Stratum 1 has values less than 39,294.00. Stratum 2 has values 39,240.00 up to but not including 298,350.00. If applicable, HV and LV stand for high value and low value, respectively. The population size (i.e. the count), the population average, the population standard deviation and the population squared coefficient of variation are calculated for each stratum. In addition to reporting these statistics, the stratification table also reports the Error Standard Deviation and Sample Size per stratum. As mentioned in the Design Issues and Information section, one can change these values.
Changing Error Std Dev. The user can change the Error Standard Deviation entries and click Plan to get sample sizes that reflect these new values as long as no inputs in General Planning and Precision and Error Assurance Planning are changed. If changes are made in these two sections, then clicking Plan will reset the Stratification and Sample Size Allocation section based on the new inputs and set the Error Std Dev equal to the Pop Std Dev by default. The Error Std Dev values are direct inputs in the formula for Precision Planning sample size. The lower the Error Std Dev values, the lower the sample size.
One may wish to set these values according to methods outlined in Statistical Auditing by Donald M. Roberts. It also possible to use the tolerable error rate or the population error occurrence rate as a multiplier to create differences that are a percentage of the recorded amounts. Using SD to symbolize standard deviation, since SD(A*X + C) = A*SD(X) where A and C are constants, then multiplying the Error Std Dev in the table by this constant error rate A (right hand side of the equal sign) is the same as multiplying all the population records for that stratum by this constant error rate A and then calculating the standard deviation (left hand side of the equal sign).
Changing Sample Size. The user can change the sample sizes in each stratum and click Extract to obtain a sample with those desired sample sizes. Just like with changing the Error Std Dev, the user should not change any inputs in General Planning or Precision Planning before doing this because otherwise the program will replan the stratification with the new inputs and extract the sample. Certain Precision and Error Assurance Planning inputs will not be reported in the Spec file because the sample size numbers no longer reflect those inputs, and reporting them would be misleading.
Best Practice Order of Operations

Open the population data file in ADA and have the population data in front of you

Run Classical Variable > Plan and Extract

Choose the appropriate column and option (All, Positive, Negative)

Enter the options for any high/low value handling

Enter a starting value for the number of strata

Change the minimum sample size entries if need and supply the output filename

Enter the Precision Planning and Error Assurance inputs

Click Plan

Change inputs under General Planning and Precision and Error Assurance Planning, including experimenting with different numbers of strata, and then clicking Plan until satisfied with the stratification and sample size

Once the General Planning and Precision and Error Assurance Planning section inputs are set, if the user wishes to change the Error Std Dev entries in the stratification table, then do so and click Plan to get new sample size calculations

Once the General Planning and Precision and Error Assurance Planning section inputs are set, if the user wishes to change the Sample Size entries in the stratification table, then do so

Click Extract to create the sample, Spec file, a high values file (_HV file) if applicable and a low values file (_LV file) if applicable
Please note that steps 10 and 11 are optional. One does not have to do step 10 before step 11. One does not have to do step 11 after doing step 10. Steps 19 and step 12 are required to extract a sample.
Spec File
After extracting a sample, ADA will create the sample file as well as a Spec file that provides automated documentation of the sampling process, including the random seed number and all relevant inputs. The data in the Spec file can be used to recreate the sample. A Spec file is required to use CVS Sample Evaluation. One should have the Spec file open in front of them when invoking CVS Sample Evaluation. This shows the Spec file after export to Excel.
After Extraction
After extracting a sample, the user is expected to export the sample and the high/low values file (if applicable) to Excel. In Excel, the user will audit each item for correctness and provide the true value in the provided AUDIT_AMT column or some other column. When creating the sample, the program will create columns called RECORDED_AMT and AUDIT_AMT. It is optional to use these columns or make columns named something else in Excel. When performing CVS Evaluation, the user has the flexibility to choose the two columns that represent the recorded amount and audit amount regardless of the names of the columns.
After auditing the sample and high/low value file(s) for correctness, the user can then import the sample and high/low values file(s) (if applicable) back into ADA and perform CVS Evaluation.
Questions
If you have questions about ADA software or you would like to know about purchasing custom ADA analytics, wonderful! Please call us at (864) 625 – 2524, and we’ll be happy to help.