How to Summarize Data with ADA

Introduction

ADA (Audit Data Assistant) lets you summarize your data with its Summarize utility.

Using the Summarize Dialog

Summarize gives you many options for summarizing your data, including options to calculate statistics for numeric columns:

Key Columns. Select the columns you would like to summarize by. For example, if you would like to see invoice totals by region and state, you would select your REGION and STATE columns. A Key Column is mandatory. As you check the boxes to select columns, those columns will be moved to the top of the list. The order matters for the checked columns at the top of the list. The key columns will be ordered left to right in the output dataset as they are ordered top to bottom in the Key Columns list. You can hold down the left mouse button and perform click and drag to reorder the columns in the list.

Calculated Columns. This is where you select the columns you would like to run calculations on. Using the above example, you would select your INVOICE_AMOUNT column. Note that only numeric fields appear here.

Included Columns. Select the individual columns you would like to include in the summarized data. By default, only the Key Columns and Calculated Columns, if chosen, will be included in the output. Use this list to identify additional columns for inclusion. Clicking a column name will select it. Clicking a selected column name will deselect it. Also, you can use the All checkbox to conveniently select or deselect all columns.

Statistics. When you have selected columns in the Calculated Columns box, Statistics will become available. Using the above example again, after you have selected your INVOICE_AMOUNT column, you would check the Sum checkbox to calculate totals for that column by region and state. Please note that Size and Count are slightly different. Size will include NULL values whereas Count will not.

Output Name. Name your summarized data file. It will be saved in the project folder with a .parquet extension. Here, the output file will be named Summarized Invoice Data.parquet.

First or Last Occurrence. When you summarize your data based on your key data columns, it will often have different values in your other included fields. In the above example, if you include INVOICE_DATE you will likely have many dates for each region and state. This is where First Occurrence and Last Occurrence come in. If you would like Summarize to use the last INVOICE_DATE it finds when summarizing the data, choose Last Occurrence. Alternatively, choosing First Occurrence will use the first INVOICE_DATE value. First Occurrence is the default setting.

Formats Supported by Summarize

ADA’s Summarize utility allows you to summarize Parquet data. Parquet is the format ADA uses for storing data files.

Questions

If you have questions about ADA software or you would like to know about purchasing custom ADA analytics, wonderful! Please call us at (864) 625 – 2524, and we’ll be happy to help.