Basics of Linking Identifiers
A core part of this package is to provide a simple and consistent interface for linking different identifiers in WRDS. One of the primary goals is to reduce the overhead of remembering how exactly to link one dataset to another.
Downloading and Saving Data
To do so, first download the necessary data from WRDS. This package provides download functions to do so (see Linking Download Functions), which are automatically called by respective generating functions (see Generating LinkPair Functions). The generating functions take in a DataFrame
(which expects certain column names) and creates the necessary functions between its identifiers. Finally, calling create_all_links()
will create the remaining links that the tables do not provide.
To provide an example:
julia> db = ODBC.Connection("wrds-pgdata-64");
julia> generate_crsp_links(db) # downloads the data, creates links between
# Permno <-> Permco, Permno <-> NCusip, etc.
# and returns the data that is downloaded
julia> generate_comp_crsp_links(db) # similar to generate_crsp_links
julia> create_all_links() # defines functions between NCusip <-> GVKey,
# Ticker <-> GVKey, etc.
The generate functions return the DataFrame that is downloaded so you can save it locally (with CSV.jl, Arrow.jl, etc.) and can use again as opposed to re-downloading the data.
This package also provides a simple function that runs all of these:
julia> download_all_links(db)
Which downloads all 6 default tables and returns those 6 DataFrames. Note that if your WRDS account lacks access to one of the tables, you need to change which items are downloaded.
For example, the code I use when starting a project is:
data_dir = joinpath(path_to_saved_files)
dfs = download_all_links(db)
files = [
"crsp_links",
"crsp_comp_links",
"gvkey_cik_links",
"ibes_links",
"option_links",
"ravenpack_links"
]
# I prefer Arrow.jl and feather files, replace with CSV.jl if desired
for (df, file) in zip(dfs, files)
Arrow.write(joinpath(data_dir, file * ".feather"), df)
end
Then, whenever I reload the project:
funs=[
generate_crsp_links,
generate_comp_crsp_links,
generate_comp_cik_links,
generate_ibes_links,
generate_option_crsp_links,
generate_ravenpack_links
]
for (file, f) in zip(files, funs)
@chain joinpath(data_dir, file * ".feather") begin
Arrow.Table
DataFrame
copy
f
end
end
create_all_links()
Linking Identifiers
Once the initial data is downloaded and necessary functions are created, the package provides a consistent set of methods to convert one identifier to any other. This follows the pattern:
(ID You Want)((ID You Have)(value), Date for conversion)
For example:
julia> GVKey(Permno(47896), Date(2020))
"002968"
julia> NCusip(CIK(19617), Date(2020)) # works for Int or String
"46625H10"
julia> CIK(Permno(47896), Date(2020))
"0000019617"
julia> CIK(NCusip("46625H21"), Date(2020))
"0000019617"
As you can see, this includes cases where there is not a table providing a direct link (CIK <-> Permno, CIK <-> NCusip). This makes it easy to link the varied datasets in WRDS.
These functions can be easily used with broadcasting:
julia> GVKey.(Permno.([47896, 44206, 46703]), Date(2020))
ERROR: UndefVarError: `GVKey` not defined
julia> GVKey.(Permno.([47896, 44206, 46703]), [Date(2018), Date(2019), Date(2020)])
ERROR: UndefVarError: `GVKey` not defined
Or with other packages such as DataFramesMeta.jl:
@chain df begin
@rtransform(:gvkey = GVKey(Permno(:permno), :date))
end
All of the identifiers that this package provides by default are seen in Identifier Types. This is expandable as discussed in Adding New Identifiers.
Generating LinkPair Functions
This section describes the default functions that exist to generate the necessary links.
WRDSMerger.generate_crsp_links
— Functiongenerate_crsp_links(
conn;
main_table=default_tables["crsp_stocknames"],
stockfile=default_tables["crsp_stock_data"]
)
generate_crsp_links(df::AbstractDataFrame)
Generates the methods linking Permno, Permco, Cusip, NCusip, Cusip6, NCusip6 and Ticker to each other. If a database connection is provided, then it will download the table, otherwise, it can use a provided DataFrame.
The file used (crsp.stocknames
), does not have a clear way to differentiate different priorities. The most common way is to calculate the market cap of any conflicting securities to determine the best option. The ideal is the market cap on the relevant day, but since this needs a static value, the default download is to average the market cap over the relevant period.
WRDSMerger.generate_comp_crsp_links
— Functiongenerate_comp_crsp_links(
conn;
main_table=default_tables["crsp_a_ccm_ccmxpf_lnkhist"]
)
generate_comp_crsp_links(df::AbstractDataFrame)
Generates the methods linking GVKey and Permno/Permco based on the CRSP/Compustat merged annual file link history (crsp_a_ccm.ccmxpf_lnkhist
). If a database connection is provided, then it will download the table, otherwise, it can use a provided DataFrame.
WRDSMerger.generate_comp_cik_links
— Functiongenerate_comp_cik_links(
conn;
main_table=default_tables["comp_company"]
)
generate_comp_cik_links(df::AbstractDataFrame)
Generates the methods linking GVKey and CIK based on the Compustat company name file (comp.company
). GVKey and CIK do not have any date conditions, so this download is relatively simple. If a database connection is provided, then it will download the table, otherwise, it can use a provided DataFrame.
WRDSMerger.generate_ibes_links
— Functiongenerate_ibes_links(
conn;
main_table=default_tables["wrdsapps_ibcrsphist"]
)
generate_ibes_links(df::AbstractDataFrame)
Generates the methods between IbesTicker and Permno/NCusip based on a standard WRDS file. If a database connection is provided, then it will download the table, otherwise, it can use a provided DataFrame.
WRDSMerger.generate_option_crsp_links
— Functiongenerate_option_crsp_links(
conn;
main_table=default_tables["optionm_all_secnmd"]
)
generate_option_crsp_links(df::AbstractDataFrame)
Generates the methods linking SecID and NCusip based on the option names file (optionm_all.secnmd
). This file only provides an "effective date", so it is assumed that once the next "effective date" occurs, the link is no longer valid. If a database connection is provided, then it will download the table, otherwise, it can use a provided DataFrame.
WRDSMerger.generate_ravenpack_links
— Functiongenerate_ravenpack_links(
conn;
main_table=default_tables["ravenpack_common_rp_entity_mapping"]
)
generate_ravenpack_links(df::AbstractDataFrame)
Generates the methods linking RPEntity and NCusip6 based on the RavenPack Entity Mapping file (ravenpack_common.rp_entity_mapping
). This file is very messy, so the automatic options make several assumptions and filters. First, when downloading the data, it filters any NCusip in the RavenPack file that is not in the crsp.stocknames
file. Second, for each RPEntity, if the end date is missing, it assumes the next start date is the appropriate end date for the link. If a database connection is provided, then it will download the table, otherwise, it can use a provided DataFrame.