Introduction
Using ADA (Audit Data Assistant) software begins with importing data. ADA makes this crucial step easy and comprehensive with its Import Wizard.
Formats Supported by Import
ADA’s Import Wizard allows you to import both Excel and delimited text files, such as CSVs, as Parquet files. Parquet is the format ADA uses for storing data files.
Location of Imported Files
Files imported by ADA are saved by default to the current project folder.
Using the Import Wizard Dialog
The first step in the import process is selecting the file(s) you wish to import. Aside from the file type, you can select a single file, multiple files from a folder or even an entire folder.
If you only need to import a particular set of data once, there’s no need to save a template file. If you choose to import more than a single file, you’ll need to specify a template. If you are only importing a single file, you can still use a template. It is optional and not mandatory.
Where do you get a template? Select one of the files and import it as a single file. After the file is imported, the import process will prompt you to save a template file.
This feature is extremely convenient for importing months of the same data, for example. If you have three years of monthly expense data saved in individual files for each month, no problem. Import one of the files, save the template, and then run an import with Multiple Files or All Files from the folder.
Here are some directions for importing single Excel, text and report files with ADA’s Import Wizard.
When you select a single Excel file to import without a template, the Excel Import Wizard dialog appears.
Select Sheets. By default, the sheets in your Excel file are unchecked. Check the checkboxes next to the sheets you wish to import. You must check at least one checkbox to enable the OK button. Click a sheet’s name in the list to see the Data Preview and Column Definitions.
First Row Contains Column Names. If your first row contains names for your columns, click the First Row Contains Column Names checkbox to remedy this. If your data does not contain column names, the columns will be named COL1, COL2, etc.
Column Definitions. Review the list of columns to make sure ADA is formatting your columns correctly. If you scroll to the right, you will see that you can specify date and time formats for date and time columns.
To change a column name, click the name in the Column Name column and type the new name:
To change the type assigned to a column, click the Data Type of the column you wish to change and select the desired column type.
If you select Date/Time, be sure to specify the correct format for your data. You have many format options:
If you don’t select a format, you’ll receive a message reminding you to do so:
If you make any changes to the column definitions, click the Refresh Preview button.
Please note that ADA will not allow you to make certain column type changes. For example, if you try to format a column with text data as an integer, you will receive an error message after clicking Refresh Preview. Note that here that SEGMENT3 has text data, like “KJ39”:
The most common change you will likely need to make is formatting character columns as dates. If you do experience an error formatting a column and you feel the conversion is an appropriate one, please click the Show Details… button and forward the information to Audit Data Analytics LLC.
The column types Character String, Decimal Number, Integer Number, Category, Boolean, Date/Time, and Time Delta correspond to Python data types object, float64, int64, category, bool, datetime64[ns] and timedelta64[ns], respectively.
Data Preview. This grid allows you to see how your data will appear after import. It shows the first 1000 records.
Output Filename Prefix. The Excel Import Wizard will take the default Output Filename Prefix from the name of the file you specified. When the data file is generated, ADA will append the sheet name. For example, the first sheet in the Excel file in the above screenshot, Demo Data, would be imported as Excel Data-Demo Data.parquet. You can change the prefix if you wish by typing a different prefix in the box.
If you plan to make a template to import multiple files, these files must have the same sheets and columns as the workbook you use to create the template. Each worksheet will import as a single Parquet data file.
When you click OK, you will see the Importing dialog:
This dialog will appear for a short time while ADA finishes importing your data.
After your file is imported, ADA will prompt you to save this file’s settings as a template.
The settings you’ve just specified in the dialog will be saved to the template file. If you have batches of files you wish to import that are in the same format, or if you have to perform imports of the same data on a regular basis, templates will save you much time and effort. If you only wish to import this particular set of data once, you can simply click No.
When you import a single text file to import without a template, the Text Import Wizard will appear:
First Row Contains Column Names. If your first row contains names for your columns, click the First Row Contains Column Names checkbox to remedy this. If your data does not contain column names, the columns will be named COL1, COL2, etc.
Column Delimiter. You can specify what character separates your data in the Column Delimiter section. The default is comma, but you can also select characters such as pipes (|) or colons (:). If your delimiter isn’t listed in the dropdown, you can type it into the Other Value box.
Text Encapsulator. Your file may have encapsulators. These are characters that surround each value in each column. Encapsulators are typically used to prevent delimiter characters, like commas, from causing values to be interpreted incorrectly. Double-quotes are commonly used for this. Here is an example of data that is encapsulated with double-quotes:
If you have encapsulators in your data, you’ll want to make sure you select the correct encapsulator character so ADA is able to import your data correctly.
Header Lines to Ignore. Sometimes your data has labels at the top of it that you don’t really want imported, like “Annual Report”. If this is the case with the data you’re importing, you can ignore header lines. If the line is the first in the report, just enter a 1. If it’s the second line, enter 2, and so on.
Decimals Character. If you’re importing data that uses a character other than a period to denote decimals, like a comma, you can specify that decimals character here.
Thousands Character. The default for thousands character is none. However, your data may contain commas, or even periods or spaces, as thousands characters. You can select the right option here.
Column Definitions. Review the column list to make sure ADA is formatting your columns correctly. If you scroll to the right, you will see that you can specify date and time formats for date and time columns.
To change a column name, click the name in the Column Name column and type the new name:
To change the type assigned to a column, click the Data Type of the column you wish to change and select the desired column type.
If you select Date/Time, be sure to specify the correct format for your data. You have many format options:
If you don’t select a format, you’ll receive a message reminding you to do so:
If you make any changes to the column definitions, click the Refresh Preview button.
Please note that ADA will not allow you to make certain column type changes. For example, if you try to format a column with text data as an integer, you will receive an error message after clicking Refresh Preview. Note that here the column SEGMENT3 has text data, like “KJ39”:
The most common change you will likely need to make is formatting character columns as dates. If you do experience an error formatting a column and you feel the conversion is an appropriate one, please click the Show Details… button and forward the information to Audit Data Analytics LLC.
The column types Character String, Decimal Number, Integer Number, Category, Boolean, Date/Time and Time Delta correspond to Python data types object, float64, int64, category, bool, datetime64[ns] and timedelta64[ns], respectively.
Data Preview. This grid allows you to see how your data will appear after import.
Output Filename. The Text Import Wizard will take the default output filename from the name of the file you’re importing. For example, the text file in the above screenshot would be imported as Test Data.parquet. You can easily rename this file if you wish by typing a different name here.
When you click OK, you will see the Importing dialog:
This dialog will appear for a short time while ADA finishes importing your data.
After your file is imported, ADA will prompt you to save this file’s settings as a template:
Now that you have imported your first data file and created a template, you’re ready to import batches of similar files.
The Report Import wizard will help import data that has been formatted in report style like the one depicted below. It creates a flat file that can be imported in the usual straightforward style of Text import. Report style data will typically consist of Detail data, Header data, Footer data and Column Headers.
Detail data are the data lines that represent the transactions for the report and are the primary area of focus. For this example of SAP invoice data, the invoice records that start with the type, such as KS, and followed by the Reference number, which represents the invoice number for the report such as XN00009575, are the Detail lines in the report.
Column Header data are the data lines that form column headers for the detail data lines. The lines that start with S St Typ DD Reference are the Column Header lines in the report.
Header and Footer data are used to augment and enhance Detail lines with additional fields of data.
Header data are lines that structurally occur ABOVE the Detail lines. The Vendor information represents the Header data in this example.
Footer data are lines that structurally occur BELOW the Detail lines. The subtotal information identified by the asterisks represents the Footer data in this example.
Selectors are the specific and generic combination of characters that can be used to identify and thereby select the desired data lines. Selectors are entered only on the top line in the Selection Viewer. They can consist of specific character entries from your keyboard or generic entries entered using the Text, Number, Space and Non-Space buttons. As regular expressions, Text stands for [A-Z], Number for [0-9], Space for \s and Non-Space for \S. The placement of the Selectors matters. The combination of Selectors forms a pattern that is matched against the data to determine which lines are identified. The search is not case-sensitive.
Column Lines are used to delineate between and separate data you want to place in different columns. It is used for a non-delimited traditional report. You can double-click in the top line/row of the Selection Viewer to toggle Column Lines or you may place the cursor at a spot in the Selection Viewer’s top row and click the Toggle Column Lines button.
If a report is Delimited, the Has Separators feature can be used instead of Column Lines. A Delimited report has separators between its data columns as shown in this example, which is pipe (|) delimited:
Note on PDFs: If a .pdf file is chosen, the file will first be processed into a text file from PDF format. This additional step will occur automatically before the file is presented in the report editor. If the PDF file is password protected, the program will prompt for the password before continuing. The data in the PDF must be data, not a picture such as is rendered when scanning to PDF. If the data is a scanned picture, the file must first be processed using an OCR (optical character recognition) software before it can be imported.
Tolerance. The Tolerance allows you to have more wiggle room when defining Selectors. The default of +/- 0 columns will allow for no wiggle room and the Selector patterns must match exactly where they are specified. A Tolerance of +/- 1 Columns will allow the patterns to match either where they are specified or one column to the left or right of where they are specified. The Tolerance is particularly useful when planning to perform multiple imports or automate a single import using a template. Due to the nature of processing PDF files into text, there is a tendency for placement of data to shift by a column to the left or right. Defining a Selector Tolerance of +/- 1 Columns will ensure the data continues to import if these subtle data shifts occur.
The Detail Line
The Detail line is mandatory. To start, pick a representative detail line and select it. Do this by clicking in the grey margin to the left of the data so that the entire line is selected like shown:
Then, click the Add Detail Line button and define your Selectors:
The Selectors in this example are two Text characters side by side and two Number characters side by side.
Next, toggle Column Lines by either using the Toggle Column Line button or by double-clicking at the appropriate spot in the first row.
Once all the Column Lines are defined, click Next…
Column Header Line
The Column Header line is an optional input that can be used as a convenience function for capturing the names for the columns. If desired, highlight a line by clicking in the grey margin to the left of the data as shown:
Click the Add Col Hdr Line button:
No Selectors are required or needed for the Column Header line.
The number of Column Lines for Detail and the number for Column Headers should match exactly. If not, you will get this message:
The Column Line after Discount Amt is missing. After correcting for the discrepancy, we can continue:
Header and Footer Lines
The optional Header Lines can be selected and the information will be associated with the Detail data located BELOW the Header Lines.
You can highlight multiple Header lines. Each new line requires its own set of Selectors and Column Lines or Separators. To create a new line, use the Add Header Line button. To select a currently existing line, click on the letter (e.g. A, B, C, etc.) in the left grey margin to highlight it and then click the Select/Edit Line button.
Likewise, Footer lines can be selected and the information will be associated with the Detail data lines ABOVE the Footer lines.
Once completed, click Finish…
Configure Text Import
The file will be flattened and then one must configure the text import. The Configuration on the left should be left the way it is. One can see that certain unwanted fields, such as S_ST and PBK_CHECK_NUMBER etc. were jammed together.
These can be eliminated by changing the Data Type to Do Not Import. Instead of column names that start with HDR or FTR, we can provide more meaningful names as well as format the Date/Time column appropriately:
By clicking Refresh Preview, we can see the changes to the columns based on our Column Definitions:
When ready, supply a name for the Output Filename and select OK. The program will ask if you want to save a template.
The results will be presented in ADA as shown:
As you can see, the vendor and subtotal information has been replicated and associated with the detail records.
Importing Batches of Files with Templates
Importing multiple Excel, report or text files is simple once you’ve created a template file. Select either multiple files or all files in a single folder then click the Browse for File button to select your files or folder.
Click the Browse for Mandatory Template button to select your template file. You must have a template to import batches of files.
Once you click OK your files will be imported with no further input needed from you.
Questions
If you have questions about ADA software or you would like to know about purchasing custom ADA analytics, wonderful! Please call us at (864) 625 – 2524, and we’ll be happy to help.