Accounting Coding Camp

List of Skills Taught:

SAS

Interface

Introduction to the SAS interface

Basics

Comments in SAS
Permanent & temporary folders
Introduction to data steps
Introduction to procedures
Viewing and printing data
Basic functions & operators
If-then statements
Creating indicator variables
Keep & drop statements
Where statements
Missing data
Date basics
Time intervals
Sorting data
Duplicate observations
Creating lag and lead variables
Creating count variables
Creating sub-samples
Stacking data
Merging with a data step
Merging with Proc SQL
Merging and dates
Summarizing data using Proc Means
Summarizing data using Proc SQL
Calculating buy-and-hold raw returns
Ranking variables
Creating groups or portfolios
Regrouping across time
Summarizing data with groups or portfolios

Outliers

Winsorizing & trimming
Robust regression

MACRO Basics

Creating MACRO variables
Creating and running MACROS
MACRO repositories

WRDS

Remote connecting to WRDS
Running code at WRDS
Pulling Compustat data
Pulling CRSP data
Merging Compustat and CRSP
Uploading & downloading data

Common Statistics Procedures

Proc Means, Univaritate, Corr Ttest and Npar1way
Proc reg (along with fixed effects, clustered errors, predicted and residual values)
Proc surveyreg, logistic surveylogistic, genmod, qlim, and robustreg

Python

Basics

Setting up Python
Anacondas installation
Common libraries (pandas, numpy)
Basic commands
Basic markdown
Reading in, importing, exporting data
Data types, dataframes and data configuration
Merging data, stacking data
Subsetting, dropping data
Cleaning data, missing values, backfilling, winsorize/truncate
Exploring data, descriptive statistics
Variables: creating variables, dropping variables, leads and lags, dealing with dates, ranked variables
Grouping, frequency tables
Basic regression

WRDS

Installing the WRDS library
Connecting to WRDS via Python
Employing SQL within Python
Acquiring Compustat data
Creating common financial statement variables
Acquiring CRSP data
Calculate cumulative and buy-and-hold abnormal returns
Merging Compustat and CRSP

Web scraping

Introduction to HTML structure
Identifying HTML xpaths
Scraping with scrapy
Splitting HTML as text
Pandas read_html
Interacting with a web browser using Selenium (sending text to search boxes, clicking buttons, extracting page source)
Scraping EDGAR
EDGAR file management and master IDX files
Introduction to EDGAR XBRL data
Scraping Yahoo! Finance

Robotic process automation

Obtaining and validating user input
Automating the keyboard (basic keyboard functions, special keys)
Automating the mouse (position, movement, clicking, scrolling)
Basics of operating system file management
Deleting files
Copying files
Moving files
Listing files in a directory
Creating folders

Textual analysis:

Introduction to natural language processing
Text functions (replace, strip, upper, lower, count, join, find, startswith, endswith)
Introduction to regular expressions (match, search, split, findall, sub, groups)
Tokenization (word_tokenize, sent_tokenize, regexp_tokenize)
Text pre-processing (stemming, stop words, punctuation)
Counting words
Calculating disclosure tone
Calculating the Fog Index

Advanced Python:

Machine learning
Intro to machine learning
What is machine learning?
Types of machine learning models: supervised vs. unsupervised
Applications of machine learning in practice
Applications of machine learning in academic research
The basic machine learning process
Logistic regression
Introduce machine learning concepts for predicting binary dependent variables.
Predict default using financial statement and stock price data
True negatives, true positives, false negatives, false positives
Accuracy vs. sensitivity vs. specificity
statsmodels vs. scikit-learn in Python
Exercise: predict employee attrition
Linear Regression
Introduce machine learning concepts for continuous dependent variables
Predict CDS spreads using financial statement and stock price data
Evaluating the prediction
Exercise: predict future revenue growth
Random forest classification
Introduction to decision trees
From decision trees to random forest (including a discussion of overfitting)
RandomForestClassifier from scikit-learn
Feature importance
Exercise: predict employee attrition
Random forest regression
Difference between random forest classification and random forest regression
RandomForestRegressor from scikit-learn
Exercise: predict future revenue growth
Neural network classification
What are neural networks (e.g., input layer, hidden layers, output layer, etc.)
MLPClassifier from scikit-learn
Standardizing input variables
Permutation feature importance
Exercise: predict banking customer churn
Neural network regression
Difference between neural network classification and neural network regression
MLPRegressor from scikit-learn
Exercise: Predict housing prices
Support vector classification
The basics of support vector machines (SVM) (e.g., hyperplanes, data separation, margin, kernels, decision boundaries, cost parameter, support vectors)
SVC from scikit-learn
Standardizing input variables
Exercise: predict banking customer churn
Support vector regression
Difference between support vector classification and support vector regression
SVR from scikit-learn
Cherkassky and Ma (2004) method for obtaining epsilon and C parameters
Exercise: Predict housing prices
Cross validation and Grid Search
What is cross validation?
Holdout cross validation
K-fold cross validation
Data independence in K-fold cross validation
Rolling window cross validation for panel data
Hyperparameter tuning with grid search
Exercise: predict default using cross validation techniques
Machine learning and textual analysis introduction
Machine learning with a large number of independent variables (e.g., words or ngrams)
CountVectorizer from sklearn
Stop words
Stemming using the Porter Stemmer algorithm
Creating matrices for use in the machine learning models
Feature importance
Exercise: predict restaurant review ratings based on comments
Machine learning and textual analysis – TF-IDF
An introduction to term-frequent inverse document frequency (tf-idf)
TfidfVectorizer from scikit-learn
Exercise: predict restaurant review ratings based on comments using tf-idf
Machine learning and textual analysis – Creating customized vectorizers
Limitations with built-in vectorizers
How to create your own customizable vectorizer
Sparse data
Running textual analysis machine learning models on documents stored in your file system
Exercise: predict restaurant review ratings based on a customized vectorizer
Latent Dirichlet Allocation
Unsupervised machine learning model
What is Latent Dirichlet Allocation (LDA) and how does it work?
genism
Exercise: Implement LDA on a set of earnings conference call transcripts
Machine learning project
Use the machine learning models taught in the course to predict daily abnormal stock returns using conference call transcripts
ChatGPT
Introduction to ChatGPT
What is ChatGPT and why you should use it in coding
An overview with examples of how to use ChatGPT to enhance your coding (e.g., to write, optimize, and debug code)
Use ChatGPT to write code
Step by step guide to writing prompts and interacting with responses to write code
Example: We ask ChatGPT to write code to extract Apple’s net income from its 2022 10-K on the SEC’s EDGAR site.
Exercise: Use ChatGPT to write code to extract the tone of words spoken by analysts on earnings conference calls.
Use ChatGPT to optimize code
Multiple examples of inefficient code with prompts to ChatGPT to optimize and remove inefficiencies.
Exercise: Write code to answer basic coding questions. Then use ChatGPT to optimize any inefficiencies in your code
Use ChatGPT to debug code
Multiple examples of coding errors with prompts to ChatGPT to debug and remove these errors.
Integrated Development Environments
Introduction to Integrated Development Environments (IDEs)
Python Idle
Examples of IDEs for Python
Reasons for using a more sophisticated IDE
Visual Studio Code
Overview of the Visual Studio Code environment
Explorer Window
Extensions
Color Themes
Command Palette
Commands and features (e.g., intellisense, multiple cursors, clicking functions, warnings, errors, etc.)
Debugging
Jupyter Notebook Extension
Markdown vs. code cells
Debugging within cells
Keyboard shortcuts and video tutorials
PyCharm
PyCharm Installation
Overview of the PyCharm Environment
Projects explorer
Debugging
Find Action
Color Themes
Commands and features (e.g., intellisense, multiple cursors, clicking functions, wrapping code, extract method, etc.)
Local history to restore deleted code

STATA

Basics

Understanding the STATA environment
Reading files from SAS and Python
Viewing a dataset
Generating variables
Replacing variables
Using local variables
Keeping and dropping variables
Sorting data
Using for loops

Model Output

Running basic regression models
Creating Excel output tables using outreg2
Modifying Excel output table options
Adding statistics to Excel output tables (Adj. R2, Pseudo R2)
Appending additional columns
Adding column headers
Excluding variables from Excel output tables