AuditNet® Tools::Script-library-for-data-analytics
This page has been created to provide a place where users can find sections of data analytics information and scripts including IDEA and ACL script code and where users can share their own scripts. This is a resource library for AuditNet® subscribers. Those auditors who use audit software, that are willing to share audit software programs that they wrote with the understanding that anyone using a contributed audit software program, accepts all risks on its accuracy in performing its intended function or result. The downloadable scripts posted here are done so as a service to AuditNet® subscribers and therefore AuditNet cannot make any guarantees regarding running the code.
The following books are recommended by AuditNet®. Each of them has complete coverage or chapters on scripting!
Therefore please check and understand any script downloaded BEFORE running it. We cannot be held responsible for any issues you encounter.
If you have created any code that you think might be transferable and useful to others please contact us.
Please read the guidelines for the operation of this exchange library.
The following was adapted from an article by Rich Lanza to describe scripts.
Scripts, also referred to as macros or batch processing, automate a set of data analysis procedures so an individual can create a set of reports, save the steps, and simply push a button called “Run Reports” to get the same results every time. The commands to run routines against a certain data file or application are applied and facilitate reviews and testing for audit and analysis. Scripting is a valuable tool for continuous monitoring - reports can be set to run on a periodic basis with no human intervention, which provides a virtual fraud detection team always working to root out malfeasance.
Thanks to the support of Rich Lanza of AuditSoftwareVideos.com and the Texas ACL Users Group
AuditNet® Script Exclusive
Rich Lanza (CPA, CFE), president of Audit Software Professionals, has a decade and a half of experience in the audit and assurance technology field and has become one of its leading authorities. Rich helps companies save millions (in their respective currency) each year using his technology tools. Rich has shared with AuditNet portions of his test sets and scripts for the benefit of AuditNet subscribers
Scripting Overview
ACL
Why write scripts? (according to Porter)
1) There are certain tests that cannot be
performed via the GUI interface that have to be done in a script.
2) Scripts allow for repeatability. If you are relying upon the GUI,
then you might perform the same test differently every time you do it.
3) By writing a script, you preserve the exact logic of what you were
doing and why you did it. This is helpful in creating your work papers.
4) By writing a script, you can create continuous monitoring/auditing
programs.
5) By writing a script, you can check you logic at a later date/time.
This is particularly beneficial when your manager asks you what you did
or if you considered some random fact.
6) A well written script will include several key checks to ensure that
the code is working properly. If you do things via a gui, you may miss
these potential controls.
7) One of the nice things about ACL is that the commands are fairly
understandable to the non-acl user. “True Programming Languages” may
require programming background to understand, but ACL is written with
simple “English” terms. This allows managers and others who couldn’t
write the program themself to evaluate the reasoning and logic utilized.
This allows for management oversight.
IDEA
IDEAScript is a development tool for IDEA users to extend the power and functionality of IDEA. It is an object-oriented programming language consistent with Microsoft® Visual Basic® and LotusScript®.
IDEAScripts provide additional functions, routines, dialogs and processes for IDEA. As a full-featured programming language, it offers a wide range of options. The most common objectives in using IDEAScripts are:
- Automating repetitive tasks
- Creating a local automated audit system
- Developing special tests and/or functions
- Working with other Object Linking and Embedding (OLE)-enabled software
- Importing files
IDEAScript is extremely versatile and can be used for any task that can be programmed.
SQL Scripts
Scripts
Scripts are a very efficient and effective means to perform a variety of audit tests and procedures. A script is simply a text file containing a set of instructions as to how the system accomplishes a specific audit test or procedure.
Advantages
Scripts can be shared among auditors and easily re-used.
The contents of scripts are in plain English, making them easier to
understand
Scripts can be run very quickly, generally much faster than using menus
Scripts can be further "tailored" by those with some programming
knowledge
Disadvantages
Scripts are unforgiving of spelling mistakes
Some typing is required (can't be processed using mouse clicks alone)
Can produce erroneous results if improperly designed
SCRIPTS ETC
AuditWare IDEAScript
Exchange Area - Compiling additional on-line content, including 'How to' guides, technical articles, hints and tips plus downloadable information such as #Functions and useful IDEAScript code.
This content will be available in a secure area, only accessible to clients with current support and maintenance contracts, where registration/login will be required.
Audit Routine Library Templates of common used audit routines written in ScriptBasic
IDEAScripting - site to support IDEA and IDEAScripting, which is the scripting language used by IDEA.
Using ACL Scripts to Teach Continuous Auditing/Monitoring: The Tremeg Case
Scripting in ACL
Joins and Scripts from
the Virginia ACL Users Group
FORUMS AND SOLUTIONS
AuditSoftwareVideos.com - site maintained by Rich Lanza contains recordings and videos vignettes on scripting and more from AuditNet®
Fraud Investigation Using IDEA by C. Stephen Turnbull
Fraud and Fraud Detection, + Website: A Data Analytics Approach (Wiley Corporate F&A)
- includes a chapter on IDEA scripting and a companion website for purchasers of the book.
AuditDataAnalytics - Twitter
Payables Test Set for ACL by Richard B. Lanza
Payables Test Set for IDEA by Richard B. Lanza
SOFTWARE VENDORS
- Audimation Services provides solutions using CaseWare-IDEA software.
- ACL Services provides case studies using ACL software.
- ActiveData from Information Active
- CaseWare Analytics
- Arbutus
USER GROUPS
New England ACL User Groups with presentations and scripts
ACL Resources (downloadable Word Documents) from the Tampa Bay ACL Users Group Version 8 Reference Guide (1.26 MB)
- Fraud Awareness Tool (48 KB)
-
Fraud Exposure Analyses
- Expenditure FE (27 KB)
- Payroll FE (26 KB)
- Revenue FE (25 KB)
- Travel FE (25 KB)
- Property FE (22 KB)
- MORE Form (30 KB)
WebCAAT Scripts (the scripts require that the Web CAAT software modules are installed and that the Web CAAT system (or at least MySQL) is running Scripts
Scripts are a very efficient and effective means to perform a variety of audit tests and procedures. A script is simply a text file containing a set of instructions as to how the system accomplishes a specific audit test or procedure.
Advantages
Scripts can be shared among auditors and easily re-used.
The contents of scripts are in plain English, making them easier to understand
Scripts can be run very quickly, generally much faster than using menus
Scripts can be further "tailored" by those with some programming knowledge
Disadvantages
Scripts are unforgiving of spelling mistakes
Some typing is required (can't be processed using mouse clicks alone)
Can produce erroneous results if improperly designed
Script Library
All of the scripts mentioned in the articles in this section are available as a zipped file which can be downloaded. Note that the scripts require that the Web CAAT software modules are installed and that the Web CAAT system (or at least MySQL) is running. Download script library.
AuditNet® has obtained all of the files and programs from the creator of WebCaats. No support is offered by the creator or AuditNet® and will be offered as is with no warranties or support.
001 - PopStats.php
A frequent need during an audit is to obtain counts, totals and other basic statistics for either an entire population or else a subset of that population. Other uses include quantifying audit exceptions. This script prepares a report of such statistics after specifying the numeric column to be analyzed as well as any criteria which needs to be applied to limit the scope of the analysis. An example would be a script to quantity the number of fixed asset records having an invalid useful life. The script could be used to prepare such information.
002 - Round.php
This script enables preparation of a schedule of all transactions which have round number amounts, i.e. are evenly divisible by $10. The selection may be restricted to just those transactions which meet specified conditions such as transactions within a date range, transactions at a particular location, etc. Output is an HTML report which can easily be imported into Excel.
003 - Histogram.php
This script reads specified records from a table and prepares a schedule of information which can be used to construct a histogram showing counts of transactions falling within specified bounds. The system selects the number and values of the bounds automatically. The resulting histogram data can be used for audit and sample planning, visual inspection of the distribution of the data, potential errors, etc.
Optionally, only those records meeting specified criteria may be included in the analysis. The output is an HTML format report which can be included in the work-papers as is, or else imported into Excel for further work.
004 - Duplicates.php
This script can be used to prepare an exhaustive list of all potential duplicates based on a comparison of the values in up to five columns. In addition, the data to be tested can optionally be restricted to just those records which meet specified criteria.
Testing for duplicates is a common procedure for identification of potential errors. Note that there is a separate script especially devoted to the identification of potential duplicate payments.
005 - Benford.php
This script provides a quick and simple means to test if specified numeric transaction amounts have leading digit counts which conform with that expected using Benford's law. The script supports six tests - first digit, first two digits, first three digits, second digit, last digit and last two digits. Tests for conformity with Benford's law are often performed as part of fraud audits or investigations.
The results of the test are written out as an HTML report which can be included in the work papers or else read into an Excel file for further analysis. The results of the tests are also assessed using the Chi Square value.
006 - Holiday.php
This script extracts those transactions which fall on a federal holiday (United States only). Output is a report of all the transaction details which is in HTML format. The report can be included in the work papers or else the information can easily be imported into Excel for further analysis.
007 - DateRange.php
This script extracts transactions "near" a specified date by identifying the number of days which constitutes "near" as well as the date that the transaction needs to be near.
The script is useful for identifying transactions within a cutoff period or other date of interest. Output of the report is an HTML file which can be used as is or else easily imported into an Excel workbook for further analysis.
Date values must be specified in the format YYYY-MM-DD. An example for cutoff testing might be to specify a date of 2013-01-04 and a date range of 5. This would extract all transactions between 2012-12-31 and 2013-01-09.
008 - SQLReport.php
This script can be used to prepare a schedule of transactions which meet the criteria specified in an SQL statement. Output is to a report in HTML format which can be used as is or else imported into Excel for further analysis.
The SQL used to prepare the report is free-form - any text value is accepted.
009 - DateNear.php
This script extracts all transactions which fall within a specified date range. In addition, conditions may be applied to further limit the transactions which are extracted - e.g. transactions at a particular location, of a particular dollar value or for a particular account.
Output is to a report in HTML format which can be included as is in the work papers or else easily imported into Excel for further analysis and documentation.
The date range values must be specified in the format YYYY-MM-DD. To perform a cutoff test for the ten days around year end, the date range might be specified as starting at 2012-12-31 and ending at 2013-01-09.
010 - Frequency.php
This script will prepare a report which shows frequency distributions for values contained in a specified column. The results can be listed in either ascending or descending order and the number of lines in the report can be limited to a specified number. In addition, criteria may be applied to the information in the table in order to restrict the frequency distribution to just records at a particular location, within a date range, etc.
Output from the report is to an HTML format file which can be used as is or else imported into an Excel workbook for further analysis.
An example use would include an analysis of sales by zip code, listing the twenty zip codes having the most number of sales. In this example the column name might be "zip code" and the number of rows specified as twenty.
011 - Pareto.php
A Pareto analysis relates to the famous "80/20" rule developed by the Italian economist Vilfredo Pareto. Many accounting amounts often conform with this rule - e.g. 20% of the vendors account for 80% of the invoice dollars, 20% of the inventory items account for 80% of the cost of sales, etc. This script provides an easy way to isolate those details. Running the script requires entering the name of the numeric column, e.g. invoice amount and the "by" variable, e.g. vendor.
Output from the script is a report in HTML format with a cutoff at the 80% value. For example, a Pareto analysis of vendor invoices would show the vendors whose invoices make up 80% (or more) of the invoice population.
012 - LinearRegression.php
Many cost and revenue elements in a business have a linear relationship. For example, when sales increase, so too do cost of sales, advertising and even amounts such as salaries. It is possible to run tests to determine if there is a linear relationship between two variables and then to apply this relationship to identify an unusual "blips" which may indicate an accounting error. using statistics, it is possible to develop both a "confidence interval" and a "prediction interval" for each data point in order to identify any unusual conditions which may merit further audit. This script automates the entire process of linear regression analysis.
Output from the script is a report in HTML format which may be used as is, or else imported into Excel for further analysis.
013 - AccountsReceivable.php
Practically every audit of accounts receivable entails some basic tests and data gathering. For example, it is generally necessary to obtain control totals to be agreed with general ledger accounts, age accounts receivable for the purpose of computing reserves, check for accounts with a credit balance, test for accounts over their credit limit, etc. This script takes a transaction file containing details of accounts receivable, summarizes it, ages it and cross checks against credit limits. The script can easily be tailored for other program steps as well.
Output from the script is a series of reports in HTML format which can be used as is, or else imported into Excel for further analysis.
014 - Extremes.php
A common audit procedure is to obtain a list of the ten largest (or smallest) items from a population and then test the amounts. The number of items required will often vary from as few as three to as many as fifty or more. This script provides a simple method for obtaining the needed information in a format easily used. To run the script specify the name of the numeric column to be extracted, the number of items and whether the largest (or smallest) items are to be extracted. A report is written in HTML format which can be used as is, or else easily imported into Excel.
015 - Ageing.php
Often it is necessary to classify and summarize transaction amounts using an ageing date. Although this is most commonly used for accounts receivable, it is equally applicable for other systems such as accounts payable, payroll, inventory, etc. This script takes as input the name of the column containing the date value to used for ageing, along with the name of the numeric column to age, the number of days in the ageing bucket (e.g. 30), etc. When the script is run it produces an output report in HTML format which can be printed as is, or else imported into Excel.
This script provides an overall ageing of the entire transaction population. There is also another script which can be used to perform an ageing using a "by" value - e.g. ageing by customer, ageing by vendor, etc.
016 - Ageing By.php
Often it is necessary to classify and summarize transaction amounts by a variable name using an ageing date. Although this is most commonly used for accounts receivable, it is equally applicable for other systems such as accounts payable, payroll, inventory, etc. This script takes as input the name of the column containing the date value to used for ageing, along with the name of the numeric column to age, the name of the "by" value (e.g. customer number) the number of days in the ageing bucket (e.g. 30), etc. When the script is run it produces an output report in HTML format which can be printed as is, or else imported into Excel.
This script provides an ageing by a variable name, e.g. ageing by customer number. There is also another script which can be perform an overall ageing of the population.
017 - Summary.php
Summarization of numeric data by values is a common procedure used to develop control totals or report totals for comparison with other sources, review for reasonableness, etc. Between one and five "by" variables may be specified. For example, the script can produce a report of sales totals by region by customer, or inventory values by store by classification by stock keeping unit, etc. To run the script specify the numeric column to be summarized and from one to five "by" variables. Output of the report is in HTML format which can be used as is or else imported into Excel for further analysis.
018 - Univariate.php
It is sometimes useful to be able to compare subgroups using various statistical measures such as mode, average, skewness, kurtosis, etc. This script enables the specification of a "by" variable - e.g. customer number, warehouse, store number etc, as well as a numeric column whose values are to be analyzed. Output from the script is a report in HTML format which shows the various statistical measures mentioned. This report can then be imported into Excel for further analysis.
As an example, an audit of accounts receivable might specify a "by" variable of customer number and an amount column of invoice amount. The report would show, by customer, the mode, average, skewness and kurtosis for invoice amounts for each customer. This procedure may also be helpful in comparing values between customers in order to identify anomalies or other unusual conditions.
019 - BasicPopStats.php
This script allows the automatic selection of outlier transactions which are defined as those transactions which exceed the population average plus a specified number of standard deviations in excess of the average. For example, in a normally distributed population, by specifying 1.96 standard deviations, this script would be expected to extract about 2.5% of the largest transactions in the population. The script performs all the calculations needed to compute the average and standard deviation (i.e. these amounts do not need to be known to the auditor).
020 - Stratify.php
This script can be used to classify numeric amounts into strata or "bins" in order to obtain a better view of the underlying data. In addition, a criteria or condition may be applied which limits the scope of the query to just specified records. Output from the script is a report which breaks down counts and amounts into the specified strata.
The strata values must be specified in ascending order, separated by spaces. So, for example, to stratify the data into values between 100 and 200, 200 and 300 and 300 and 500 the bin values would be specified as "100 200 300 500" (without the quotes). Note that the system will automatically create a strata to pick up any values outside these ranges.
021 - DuplicateInvoices.php
This script can be used to apply 19 tests to invoice payment amounts in order to identify potential duplicate invoice amounts. The script is based upon searches and matches for key columns of data such as vendor number, invoice number, invoice amount and invoice date. A number of tests can be applied, including "fuzzy" tests based upon user specified criteria. Output from the script is a report of potential pairs of duplicate invoice payments, based upon the criteria provided.
022 - Diagnostic.php
This script can be used to identify incomplete or missing data. In order to run the script, specify the name of the column to be tested. Output is a report in HTML format.
023 - CTA.php
This script will calculate a linear regression estimate for each "by" variable and write the results of the analysis to a work table named "wk". A query can then be run against the work table to list the contents by any of the regression formula variables - R, slope, intercept, and spike percentage. Often a limit may be placed in order to list just a specified number of variables. The purpose of the script is to perform an exploratory analysis.
024 - Distinct.php
This script can be used to identify all distinct values contained in a specified column of data. Output is an HTML report containing a list of all such values. Uses include identification of unusual or error situations. For example, a column in a table containing employee information may include a column named "Gender". Expected values would be "M" or "F", for example. This procedure will list all values which might include "?", blank, etc.
025 - DateGap.php
This script can be used identify missing dates within a population of transactions. To run the script, specify the name of the column which contains the date value to be analyzed. Examples of uses might include reviews of sales journals to identify dates when no sales were recorded, etc. Output from the report is in HTML format.
026 - MT.php
The "Mersenne twister" is an algorithm for the generation of random numbers which was developed in Japan. The routine has passed the "Die Hard" tests developed by the US National Institute of Standards (NIST) and thus is quite robust. This script can be used to assign random numbers within a specified range for each transaction in a population. Example uses include the selection of a random sample. Output from the script is a report file which mirrors the input data and adds a column to the right containing the random number which was generated for that row.
027 - POBox.php
Addresses may contain post office boxes or private mail boxes using a variety of spellings and formats. Examples include "PO Box", "P.O. Box", "Post Office Box, "POB", etc. Private mail boxes are indicated by a # and the word "PMB". This script can be used to search through a population of addresses to extract those which contain either a post office box or a private mail box address. There is an option to limit the test to one or the other (or both). Output is a report of addresses which contain a post office box or a private mail box. Example uses of the test are generally to identify vendors (or employees) who are using a post box in their contact address.
028 - CI.php
This script can be used to assess an attribute sample by computing a confidence interval. The computation is based on the paper published in 1987 by John Buonaccorsi at the University of Massachusetts. (Which is a much more accurate calculation than often seen). Example uses include stop and go sampling or any other type of attribute sampling.
The script uses the algorithm and method described in the article by John P. Buonaccorsi (1987), “A Note on Confidence Intervals for Proportions in Finite
Populations,” The American Statistician, Vol. 41, No. 3, 215-218.
029 - VEM.php
This script can be used for a variety of tests which attempt to match up persons on an employee file with vendors on the vendor file. Techniques include matches on address, telephone, taxpayer identification number, bank account, etc. Output from the report is a list of potential matches in HTML format. The tests to be performed may be selected, i.e. not all tests need to be done or may be applicable.