Basics of Linking Identifiers

A core part of this package is to provide a simple and consistent interface for linking different identifiers in WRDS. One of the primary goals is to reduce the overhead of remembering how exactly to link one dataset to another.

Downloading and Saving Data

To do so, first download the necessary data from WRDS. This package provides download functions to do so (see Linking Download Functions), which are automatically called by respective generating functions (see Generating LinkPair Functions). The generating functions take in a DataFrame (which expects certain column names) and creates the necessary functions between its identifiers. Finally, calling create_all_links() will create the remaining links that the tables do not provide.

To provide an example:

julia> db = ODBC.Connection("wrds-pgdata-64");
julia> generate_crsp_links(db) # downloads the data, creates links between 
# Permno <-> Permco, Permno <-> NCusip, etc.
# and returns the data that is downloaded

julia> generate_comp_crsp_links(db) # similar to generate_crsp_links

julia> create_all_links() # defines functions between NCusip <-> GVKey, 
# Ticker <-> GVKey, etc.

The generate functions return the DataFrame that is downloaded so you can save it locally (with CSV.jl, Arrow.jl, etc.) and can use again as opposed to re-downloading the data.

This package also provides a simple function that runs all of these:

julia> download_all_links(db)

Which downloads all 6 default tables and returns those 6 DataFrames. Note that if your WRDS account lacks access to one of the tables, you need to change which items are downloaded.

For example, the code I use when starting a project is:

data_dir = joinpath(path_to_saved_files)
dfs = download_all_links(db)
files = [
    "crsp_links",
    "crsp_comp_links",
    "gvkey_cik_links",
    "ibes_links",
    "option_links",
    "ravenpack_links"
]
# I prefer Arrow.jl and feather files, replace with CSV.jl if desired
for (df, file) in zip(dfs, files)
    Arrow.write(joinpath(data_dir, file * ".feather"), df)
end

Then, whenever I reload the project:

funs=[
    generate_crsp_links,
    generate_comp_crsp_links,
    generate_comp_cik_links,
    generate_ibes_links,
    generate_option_crsp_links,
    generate_ravenpack_links
]
for (file, f) in zip(files, funs)
    @chain joinpath(data_dir, file * ".feather") begin
        Arrow.Table
        DataFrame
        copy
        f
    end
end
create_all_links()

Linking Identifiers

Once the initial data is downloaded and necessary functions are created, the package provides a consistent set of methods to convert one identifier to any other. This follows the pattern:

(ID You Want)((ID You Have)(value), Date for conversion)

For example:

julia> GVKey(Permno(47896), Date(2020))"002968"
julia> NCusip(CIK(19617), Date(2020)) # works for Int or String"46625H10"
julia> CIK(Permno(47896), Date(2020))"0000019617"
julia> CIK(NCusip("46625H21"), Date(2020))"0000019617"

As you can see, this includes cases where there is not a table providing a direct link (CIK <-> Permno, CIK <-> NCusip). This makes it easy to link the varied datasets in WRDS.

These functions can be easily used with broadcasting:

julia> GVKey.(Permno.([47896, 44206, 46703]), Date(2020))ERROR: UndefVarError: `GVKey` not defined
julia> GVKey.(Permno.([47896, 44206, 46703]), [Date(2018), Date(2019), Date(2020)])ERROR: UndefVarError: `GVKey` not defined

Or with other packages such as DataFramesMeta.jl:

@chain df begin
    @rtransform(:gvkey = GVKey(Permno(:permno), :date))
end

All of the identifiers that this package provides by default are seen in Identifier Types. This is expandable as discussed in Adding New Identifiers.

Generating LinkPair Functions

This section describes the default functions that exist to generate the necessary links.

WRDSMerger.generate_crsp_links — Function

generate_crsp_links(
    conn;
    main_table=default_tables["crsp_stocknames"],
    stockfile=default_tables["crsp_stock_data"]
)

generate_crsp_links(df::AbstractDataFrame)

Generates the methods linking Permno, Permco, Cusip, NCusip, Cusip6, NCusip6 and Ticker to each other. If a database connection is provided, then it will download the table, otherwise, it can use a provided DataFrame.

The file used (crsp.stocknames), does not have a clear way to differentiate different priorities. The most common way is to calculate the market cap of any conflicting securities to determine the best option. The ideal is the market cap on the relevant day, but since this needs a static value, the default download is to average the market cap over the relevant period.

source

WRDSMerger.generate_comp_crsp_links — Function

generate_comp_crsp_links(
    conn;
    main_table=default_tables["crsp_a_ccm_ccmxpf_lnkhist"]
)

generate_comp_crsp_links(df::AbstractDataFrame)

Generates the methods linking GVKey and Permno/Permco based on the CRSP/Compustat merged annual file link history (crsp_a_ccm.ccmxpf_lnkhist). If a database connection is provided, then it will download the table, otherwise, it can use a provided DataFrame.

source

WRDSMerger.generate_comp_cik_links — Function

generate_comp_cik_links(
    conn;
    main_table=default_tables["comp_company"]
)

generate_comp_cik_links(df::AbstractDataFrame)

Generates the methods linking GVKey and CIK based on the Compustat company name file (comp.company). GVKey and CIK do not have any date conditions, so this download is relatively simple. If a database connection is provided, then it will download the table, otherwise, it can use a provided DataFrame.

source

WRDSMerger.generate_ibes_links — Function

generate_ibes_links(
    conn;
    main_table=default_tables["wrdsapps_ibcrsphist"]
)

generate_ibes_links(df::AbstractDataFrame)

Generates the methods between IbesTicker and Permno/NCusip based on a standard WRDS file. If a database connection is provided, then it will download the table, otherwise, it can use a provided DataFrame.

source

WRDSMerger.generate_option_crsp_links — Function

generate_option_crsp_links(
    conn;
    main_table=default_tables["optionm_all_secnmd"]
)

generate_option_crsp_links(df::AbstractDataFrame)

Generates the methods linking SecID and NCusip based on the option names file (optionm_all.secnmd). This file only provides an "effective date", so it is assumed that once the next "effective date" occurs, the link is no longer valid. If a database connection is provided, then it will download the table, otherwise, it can use a provided DataFrame.

source

WRDSMerger.generate_ravenpack_links — Function

generate_ravenpack_links(
    conn;
    main_table=default_tables["ravenpack_common_rp_entity_mapping"]
)

generate_ravenpack_links(df::AbstractDataFrame)

Generates the methods linking RPEntity and NCusip6 based on the RavenPack Entity Mapping file (ravenpack_common.rp_entity_mapping). This file is very messy, so the automatic options make several assumptions and filters. First, when downloading the data, it filters any NCusip in the RavenPack file that is not in the crsp.stocknames file. Second, for each RPEntity, if the end date is missing, it assumes the next start date is the appropriate end date for the link. If a database connection is provided, then it will download the table, otherwise, it can use a provided DataFrame.

source