Introduction

ADA (Audit Data Assistant) allows you to plan and extract attribute samples. Attribute sampling seeks to estimate the proportion of a particular attribute, or characteristic, in a population using a sample. In audit sampling the attribute is a characteristic such as an error in processing or a control error where a judgment is made for each sample item as to whether it has the attribute or not. Thus, the items in the sample are each provided a pass/fail, true/false or yes/no judgment when audited and evaluation will calculate a projection of the population error rate with a confidence interval based on the observed sample error rate. The Attribute sampling module uses the hypergeometric cumulative distribution function (CDF) to make all its planning and evaluation calculations as it is the most appropriate statistical sampling distribution to use when sampling without replacement. Unstratified, uniform random sampling whereby each item in the population has an equal chance of selection is employed to perform sample extractions. Using an attribute table or calculation with a stratified sample is not strictly valid because the tables/calculations assume unrestricted random sampling.

Attribute Sampling Plan and Extract can be used solely for the calculation of the appropriate sample size and it can also be used to then extract that sized sample from a population file if provided. If no data file is open in ADA when the Plan and Extract module is executed, the Population Size value starts with the value zero, which must be changed. If the Extract button is subsequently selected after planning a sample size, then the user will be prompted to choose the appropriate population file. It will need to have the same number of records as the entered Population Size entry to proceed. If a data file is already open in the ADA Project Viewer when the Plan and Extract module is executed, then the number of records from that file will be used as the Population Size entry. You can change the Population Size entry, but keep in mind that Extract will not proceed unless the number of records from the file matches the Population Size entry. The program will offer the opportunity to replan the sample size after resetting the Population Size entry based on the file chosen if the user chooses to do so. If you plan to extract a sample, the best practice is to open the population file first in ADA and then run Attribute > Plan and Extract. Then, after entering the desired Sample Size Planning inputs, click Plan to get the calculated sample size. Once satisfied with the resulting sample size (if not, change the inputs and click Plan again), enter the output filename and click Extract to obtain the sample.

Note: If, after clicking Plan, the user changes the Sample Size Planning inputs and then clicks Extract without clicking Plan first, then the program will replan the sample size based on the new inputs and immediately extract the sample. Thus, it is a best practice to click Plan after changing the Sample Size Planning inputs. That way there are no surprises.

Formats Supported by Attribute Sampling Plan and Extract

ADA’s Attribute Sampling Plan and Extract utility allows you to sample from Parquet data. Parquet is the format ADA uses for storing data files.

Using the Dialog

The dialog box consists of three sections: Sample Size Planning, Output Filename and Sample Descriptive Statistics. The Sample Size Planning inputs determine the required sample size that achieves the desired precision to test the auditor’s material tolerable population error with allowances for errors in the sample as specified using the sample expected error for the stated level of confidence. In the Output Filename section, the user provides the name for the sample file if one generates a sample using Extract. The Sample Descriptive Statistics section gives the results of clicking Plan, such as the required sample size to achieve the sampling objective.

**Sample Size Planning**. This section requires Population Size, the Population Tolerable Error, the Sample Expected Error, the Confidence Level and the Random Seed Number. Once provided, the user can click Plan and get the calculated sample size in the Sample Descriptive Statistics area of the dialog.

**Population Size**. The number of items or records in the population data file. This is automatically filled in if an ADA data file is already open when the Plan and Extract module is executed. If using Extract, this value must match the number of records of the data file used as the population file. If not, the number must be adjusted or a new file must be selected, which will require closing and restarting the module.

**Population Tolerable Error (Materiality)**. The user’s entry of how much error they are willing to tolerate in the population before making recommendations for substantive changes to controls. This is usually called the auditor’s materiality or performance materiality and represents an upper bound. The hypergeometric calculator will calculate probabilities assuming there is this much error in the population.

**Population Error Rate**. The user can enter the error rate as a percentage of the total population size.

**Population Number of Errors**. The user can enter the total number of errors in the population.

For a population of size 10,000, a population error rate of 10% corresponds to 1,000 errors in the population. Either entry will yield the same answer.

**Sample Expected Error (Alpha Risk Control)**. Traditionally called Expected Error, this entry represents the critical number of errors the user is willing to tolerate in the sample while still concluding that there is immaterial error in the population (i.e. population error that does not exceed the Population Tolerable Error). This entry has nothing to do with expecting error in the population. While the user may have no expectations regarding the errors a sample could potentially produce, the user can more concretely judge the number of errors they may be willing to tolerate in the sample or allow the sample to have while still deeming the population error immaterial. This represents an implicit form of alpha risk control (i.e. the risk of overauditing). This allows the user to mitigate the possibility of getting “unlucky” with the sample. For example, if a population of 10,000 items has only 1 error in it and this item is randomly selected for the sample, then a sample planned with a 0% or 0 number entry for Sample Expected Error will lead to the conclusion of material error in the population even if the user was willing to tolerate population errors well in excess of 0.01%.

**Tolerable Sample Error Rate**. The user can enter the error rate as a percentage of the unknown sample size.

**Tolerable Number of Errors**. The user can enter the total number of errors in the sample one is willing to tolerate regardless of sample size.

**Confidence Level**. Defines the Beta risk for the sampling plan, which is the risk of not finding material error with the sample when it exists in the population. The Beta risk is the complement of the Confidence Level (i.e. 100% – C.L.%). The hypergeometric CDF will find the appropriate sample size based on the other entries so that the likelihood of not finding material error with the sample when it exists in the population is less than or equal to this Beta risk complement amount.

**Random Seed Number**. The seed number for starting the random number generator. Using the same Random Seed Number along with the same population data file and the same Sample Size Planning inputs (Population Size, Population Tolerable Error, Sample Expected Error) will allow the user to replicate a prior sample.

**Sample Descriptive Statistics. **The result of planning using the Sample Size Planning inputs. After the user clicks Plan, these results are updated.

**Calculated Sample Size**. The number of records required for a sample to achieve the objectives set using the Sample Size Planning inputs.

**Tolerable Sample Number of Errors**. Appears when user enters the Tolerable Sample Error Rate in Sample Size Planning. Represents the critical number of errors the sample can absorb before leading to conclusions of material error in the population.

**Tolerable Sample Error Rate**. Appears when user enters the Tolerable Number of Errors in Sample Size Planning. An informative statistic, it represents the Tolerable Number of Errors (aka critical number) divided by the Calculated Sample Size.

**Tolerable Population Number of Errors**. Appears when user enters the Population Error Rate in Sample Size Planning. An informative statistic that results from multiplying the Population Error Rate by the Population Size.

**Tolerable Population Error Rate**. Appears when user enters the Population Number of Errors in Sample Size Planning. An informative statistic that results from dividing the Population Number of Errors by the Population Size.

**Output Filename**. Name your sample data file. It will be saved in the project folder with a .parquet extension. Here, the output file will be named **Attribute Sample Name.parquet**.

Spec File

After extracting a sample, ADA will create the sample file as well as a Spec file that provides automated documentation of the sampling process, including the random seed number and all relevant inputs. The data in the Spec file can be used to re-create the sample.

Questions

If you have questions about ADA software or you would like to know about purchasing custom ADA analytics, wonderful! Please call us at (864) 625 – 2524, and we’ll be happy to help.