How to Random Sample with ADA

Introduction

ADA (Audit Data Assistant) allows you to extract random samples. The ADA Random Sampling module employs unstratified, uniform random sampling whereby each item in the population has an equal chance of selection. Random Sampling will work on the data file open in ADA when the Random Sampling module is invoked. If no file is open, the user will be prompted to choose a file. The resulting random sample is generated in the project folder.

Formats Supported by Random Sampling

ADA’s Random Sampling utility allows you to sample from Parquet data. Parquet is the format ADA uses for storing data files.

random sample dialog

Using the Dialog

The dialog box consists of two sections: Random Sampling Inputs and Output Filename. The Random Sampling Inputs section allows the user to input the desired sample size and random seed number as well as select the type of sampling method to use. In the Output Filename section, the user provides the name for the sample file that is generated by the module.

Random Sampling Inputs. This section provides the Population Size and requires the Desired Sample Size, Random Seed Number and the Sampling Method as inputs.

Population Size. The number of items or records in the population data file. This is automatically filled in and cannot be changed.

Desired Sample Size. The number of records the user wants randomly selected for the sample. Must be less than or equal to the Population Size.

Random Seed Number. The seed number for starting the random number generator. Using the same Random Seed Number along with the same population data file, the same Desired Sample Size and the same Sampling Method will allow the user to replicate a prior sample.

Sampling Method. Check the box to perform Sampling With Replacement.

Sampling Without Replacement. Once an item is selected for the sample it cannot be selected again. This method produces a sample with no duplicates from the population.

Sampling With Replacement. After an item is randomly selected for the sample, it is still eligible to be randomly selected again. This method can potentially create duplicates in the sample.

Output Filename. Name your sample data file. It will be saved in the project folder with a .parquet extension. Here, the output file will be named Random Sample Name.parquet.

random sample output filename

Spec File

After extracting a sample, ADA will create the sample file as well as a Spec file that provides automated documentation of the sampling process, including the random seed number and all relevant inputs. The data in the Spec file can be used to re-create the sample.

random sample spec file

Questions

If you have questions about ADA software or you would like to know about purchasing custom ADA analytics, wonderful! Please call us at (864) 625 – 2524, and we’ll be happy to help.