Introduction
ADA (Audit Data Assistant) allows you to plan and extract monetary unit (also called dollar unit) samples. Monetary Unit Sampling (MUS), like Classical Variable Sampling (CVS), is used to project monetary misstatement for a population based on a sample. Unlike CVS, MUS has the flexibility to work well in both low and high error rate populations (CVS is a high error rate population application). Unlike Attribute Sampling where an all-or-nothing yes/no is supplied when auditing the sample, with monetary misstatement sampling like MUS the degree of error is taken into account when projecting results. So, an item can be 100% in error, 50% in error, 10% in error, et cetera.
MUS is the application of attribute sampling to money. With ADA’s MUS Sample Planning module, the user can plan a statistically valid sample using many of the same inputs used in attribute sampling, primarily the tolerable population error rate, the expected sample error rate and the desired confidence level. The module can then be used to extract a sample based on the extraction options selected by the user, primarily the extraction method and high value handling. The sample is extracted from the population file, which is, or should be, the file open in ADA when the module is executed. Sampling extraction is performed using Probability Proportional to Size (PPS) sampling methods. Therefore, the larger valued audit units are more likely to be selected for the sample.
If you plan to extract a sample, the best practice is to open the population file first in ADA and then run Monetary Unit > Plan and Extract. Then, after entering the desired Sample Size Planning inputs, Extraction Planning inputs and the Output Filename, click Plan to get the calculated sample size. Once satisfied with the resulting sample size (if not, change the inputs and click Plan again), click Extract to obtain the sample.
Note: If, after clicking Plan, the user changes the Sample Size Planning inputs or chooses a new column and then clicks Extract without clicking Plan first, then the program will replan the sample size based on the new inputs and immediately extract the sample. Thus, it is a best practice to click Plan after changing any inputs. That way there are no surprises.
After extracting a sample, the user is expected to export the sample and the high values file (if applicable) to Excel. Then, after auditing the sample for correctness, the user will import the audited sample file back into ADA and perform MUS Evaluation.
Formats Supported by MUS Sample Planning
ADA’s MUS Sample Planning utility allows you to sample from Parquet data. Parquet is the format ADA uses for storing data files.
Using the Dialog
The dialog box consists of four sections: Recorded Amount Values to Use, Sample Size Planning, Extraction Planning, Output Filename and Sample Descriptive Statistics. The Sample Size Planning inputs determine the required sample size that achieves the desired precision to test the auditor’s material tolerable population error with allowances for errors in the sample as specified using the sample expected error for the stated level of confidence. The Extraction Planning inputs controls how a sample is extracted from the population data file. In the Output Filename section, the user provides the name for the sample file if one generates a sample using Extract. The Sample Descriptive Statistics section gives the results of clicking Plan, such as the required sample size to achieve the sampling objective.
Recorded Amount Values to Use. In this section, select the numeric column to use from the dropdown list. Choose the Positive, Negative or Non-zero Absolute option. The Positive option subsets the provided data down to only the values greater than zero to define the population items. The Negative option defines the population to be the subset of items less than zero. The Non-zero Absolute option also excludes items with zero value, which have no chance for selection when performing PPS sampling, and applies the absolute value function to the column of data, making the negative values positive.
Sample Size Planning. This section requires the Population Tolerable Error, the Sample Expected Error, the Confidence Level and the Error Size Limit.
Population Tolerable Error (Materiality). The user’s entry of how much error they are willing to tolerate in the population before making recommendations for substantive changes to controls. This is usually called the auditor’s materiality or performance materiality and represents an upper bound. The hypergeometric calculator will calculate probabilities assuming there is this much error in the population.
Percent. The user can enter the error rate as a percentage of the total population value.
Amount. The user can enter the total amount of dollar error in the population.
For a population of size $1,000,000, a population error rate of 10% corresponds to $100,000 worth of error in the population. Either entry will yield the same answer.
Sample Expected Error (Alpha Risk Control). Traditionally called Expected Error, this entry represents the critical number of errors the user is willing to tolerate in the sample while still concluding that there is immaterial error in the population (i.e. population error that does not exceed the Population Tolerable Error). This entry has nothing to do with expecting error in the population. While the user may have no expectations regarding the errors a sample could potentially produce, the user can more concretely judge the number of errors they may be willing to tolerate in the sample or allow the sample to have while still deeming the population error immaterial. This represents an implicit form of alpha risk control (i.e. the risk of overauditing). This allows the user to mitigate the possibility of getting “unlucky” with the sample. For example, if a population of 10,000 items has only 1 error in it and this item is randomly selected for the sample, then a sample planned with a 0% or 0 number entry for Sample Expected Error will lead to the conclusion of material error in the population even if the user was willing to tolerate population errors well in excess of 0.01%.
Tolerable Sample Error Rate. The user can enter the error rate as a percentage of the unknown sample size. This is simply an error occurrence rate for the sample and the degree of error (i.e. tainting) should not be taken into account when deciding on the percentage number. For planning purposes, the tainting for each error is effectively applied at approximately 100% tainting.
Tolerable Number of Errors. The user can enter the total number of errors in the sample one is willing to tolerate regardless of sample size. For planning purposes, each error is effectively treated as if it will create approximately 100% tainting.
Confidence Level. Defines the Beta risk for the sampling plan, which is the risk of not finding material error with the sample when it exists in the population. The Beta risk is the complement of the Confidence Level (i.e. 100% – C.L.%). The hypergeometric CDF will find the appropriate sample size based on the other entries so that the likelihood of not finding material error with the sample when it exists in the population is less than or equal to this Beta risk complement amount.
Error Size Limit. Sets the upper bound for an individual item tainting, which is the percentage difference between recorded and audited amounts. Most applications of MUS assume the most an amount could be in error is 100%. For example, if an invoice cannot be found to substantiate an item in the sample, its audited amount would be entered as zero, which would be a 100% error tainting overstatement assuming a positive value for the invoice amount. If the audited amount were to be negative (a sign change), the tainting percentage would exceed 100%. Also, understatements are at risk for breaching the Error Size Limit. If the audited amount is more than twice the value of the recorded amount, the tainting percentage would also exceed 100%. If these situations are possible, it may be a good idea to increase this factor. Reducing this limit could be justified if there is a known cap to the amount of error an item can exhibit and it is less than 100%. This Error Size Limit factor is the one used to price basic precision.
Extraction Planning. This section requires the Extraction Method and the High Values handling section to be addressed as inputs by the user. The Random Seed Number is automatically supplied by the program but can be changed to replicate a sample. The Applied Sampling Interval is automatically filled in after a user clicks Plan, but it can also be used as an input if the user has one already and wishes to skip the planning stage altogether.
Extraction Method. The desired method for performing PPS extraction. Imagine breaking the audit units in the population into individual $1 units (aka dollar units) and laying them out on a massive table. The individual $1 units are grouped into segments defined by the amount of the Applied Sampling Interval. There are as many segments as the size of the sample. The segments are defined starting from the top and working towards the bottom. For example, a $3 million population with a $100,000 Applied Sampling Interval will have 30 segments. Starting from the top of the population data file, the first $100,000 is assigned to the first segment. The second $100,000 is assigned to the second segment, et cetera. A $1 unit is selected from each of these segments and the audit unit associated with the $1 unit is selected for sample inclusion.
Cell. The default, cell selection randomly selects a dollar unit within each applied sampling interval as it moves from top to bottom through the data. The algorithm will not select a duplicate item for the sample. Cell selection is more randomized than fixed interval selection because the spot it picks within each segment is randomly chosen each time.
Fixed Interval. Selects a random dollar spot within the first applied sampling interval segment and then selects that same spot from each subsequent sampling interval segment as it moves from top to bottom through the data. Systematic recurrences in the data have a very small chance of creating bias in the sample with this method that cell selection obviates.
High Values. High value items can be effectively removed from the population leaving a sampled population (i.e. the population from which one samples) devoid of these larger items. It is commonplace to extract values greater than or equal to the Applied Sampling Interval due to the high likelihood that they will be selected more than once for the sample. While this is technically not a problem, many users find the sample duplicates undesirable.
Extract and Sample 100%. Subjects items greater than or equal to the High Value Cutoff to being sampled 100%. Any time items have a 100% chance of selection they are not subject to sample projection. In other words, high value items that are selected to file prior to drawing the sample have a 100 percent chance of inclusion. The result of auditing these high value items leads to simply adding the value of errors found directly to the bottom line when performing evaluation.
High Value Cutoff. Values higher than this high value cutoff in absolute value will be excluded from the sample and placed in a separate _HV file in the project folder. For convenience, the program displays the total value and number of items that qualify as being greater than or equal to this cutoff.
Duplicate High Value Items in Sample. This option treats all items in the population the same and audit units with values greater than or equal to the Applied Sampling Interval can and probably will be selected for sample inclusion more than once. For example, a $300,000 item will be selected 2 or possibly 3 times with an Applied Sampling Interval of $100,000 as it will span 3 segments. Errors found in these high value duplicate items should be entered for each duplicated entry in the sample. The projections will be the same as if the high value items were selected to a separate file with the Extract and Sample 100% option.
Random Seed Number. The seed number for starting the random number generator. Using the same Random Seed Number along with the same population data file inputs will allow the user to replicate a prior sample.
Applied Sampling Interval. The sampling interval applied to the population to extract the sample file. Equal to the total value of the population (including high value items) divided by the sample size calculated using the Sample Size Planning inputs. In MUS theory, this is the appropriate sampling interval to apply even if the population is later reduced by removal of high value items.
Output Filename. Name your sample data file. It will be saved in the project folder with a .parquet extension. Here, the output file will be named MUS Sample.parquet.
Sample Descriptive Statistics. The result of planning using the Sample Size Planning and Extraction Planning inputs. After the user clicks Plan, these results are updated.
Achieved Sample Size. The number of records in the sample the user would get after clicking Extract. After removing high value items (if applicable), application of the Applied Sampling Interval to the remaining population items will produce the sample size generated here.
Achieved Sampling Interval. Potentially larger than the Applied Sampling Interval, this figure represents the population amount after high value item removal divided by the Achieved Sample Size. This is the sampling interval that will be used when employing MUS Evaluation.
Total Allowable Taintings. Provides a theoretical upper bound on the amount of taintings (i.e. the percentage difference between the recorded amount and audited amount) allowed/tolerated in the sample when performing Stringer Bound evaluation while still concluding the population error is immaterial.
Best Practice Order of Operations
Typically, a user will want to plan and extract a sample in ADA.
-
Open the population data file in ADA and have the data in front of you
-
Run Monetary Unit > Plan and Extract
-
Choose the appropriate column and option (Non-zero Absolute, Positive, Negative)
-
Enter the all Sample Size Planning inputs as appropriate for your sampling plan
-
Enter the Output Filename
-
Under Extraction Planning, select an option for Extraction Method and High Values – you do not need to enter/change the starting values for the High Value Cutoff or Applied Sampling Interval
-
Click Plan
-
Revise inputs as needed and click Plan again until satisfied with the Sample Descriptive Statistics
-
Click Extract to create sample, Spec file and high values file (_HV file) if applicable
PPS Extraction of Sample without Using Plan
If the user already has a sampling interval they want to apply and wants the program to extract a PPS sample, then these are the steps to follow.
-
Open the population data file in ADA and have the data in front of you
-
Run Monetary Unit > Plan and Extract
-
Choose the appropriate column and option (Non-zero Absolute, Positive, Negative)
-
Enter the Output Filename
-
Under Extraction Planning, enter the sampling interval under Applied Sampling Interval
-
Under Extraction Planning, select an option for Extraction Method and High Values – if you choose to Extract and Sample 100% then you must enter the High Value Cutoff
-
Click Extract to create sample, Spec file and high values file (_HV file) if applicable
Spec File
After extracting a sample, ADA will create the sample file as well as a Spec file that provides automated documentation of the sampling process, including the random seed number and all relevant inputs. The data in the Spec file can be used to re-create the sample. The Spec file and not the sample file is the ideal file to use or have open in front of you when invoking MUS Sample Evaluation. Doing so will allow the program to automatically fill in certain needed inputs, namely the Sampled Population Total Amount, the Error Size Limit and the Confidence Level.
After Extraction
After extracting a sample, the user is expected to export the sample and the high values file (if applicable) to Excel. In Excel, the user will audit each item for correctness and provide the true value in the provided AUDIT_AMT column or some other column. When creating the sample, the program will create columns called RECORDED_AMT and AUDIT_AMT. It is optional to use these columns or make columns named something else in Excel. When performing MUS Evaluation, the user has the flexibility to choose the two columns that represent the recorded amount and audit amount regardless of the names of the columns.
After auditing the sample for correctness, the user can then import the sample and high values file (if applicable) back into ADA and perform MUS Evaluation.
Questions
If you have questions about ADA software or you would like to know about purchasing custom ADA analytics, wonderful! Please call us at (864) 625 – 2524, and we’ll be happy to help.