WRDSMerger.jl Docs

Installation

From the Julia REPL:

julia> ]add WRDSMerger
julia> using Pkg; Pkg.add(WRDSMerger)

From source:

julia> ]add https://github.com/junder873/WRDSMerger.jl
julia> using Pkg; Pkg.add(url="https://github.com/junder873/WRDSMerger.jl")

Establish DB Connection

This package requires a subscription to WRDS and can only access datasets that are included in your subscription. Any database connection that supports DBInterface.execute will work. There are several ways to connect:

LibPQ

LibPQ.jl connects directly to the WRDS Postgres server. It has no query length limit, which is important for functions like crsp_data that generate very long queries:

using LibPQ
conn = LibPQ.Connection(
    """
        host = wrds-pgdata.wharton.upenn.edu
        port = 9737
        user='username'
        password='password'
        sslmode = 'require' dbname = wrds
    """
)

Note, running the above too many times may cause WRDS to temporarily block your connections for having too many. Run the connection at the start of your script and only rerun that part when necessary.

ODBC

Alternatively, you can connect to WRDS through an ODBC driver using ODBC.jl. ODBC is considerably faster at converting large result sets to DataFrames but requires additional driver setup. I recommend following the setup steps listed under WRDS support for connecting with Stata (since that also uses ODBC). You can find that information here.

The third method is if you have the data locally, such as in a DuckDB database or as Parquet/CSV files. DuckDB is the recommended approach for local data (and is what this package uses for testing). DuckDB can read Parquet, CSV, and other file formats directly:

using DuckDB
conn = DBInterface.connect(DuckDB.DB, "my_wrds_data.duckdb")

If your DuckDB database uses different schema/table names than the WRDS defaults, update the table mappings:

WRDSMerger.default_tables["comp_funda"] = "comp.funda"
WRDSMerger.default_tables["crsp_stocknames"] = "crsp.stocknames"
# ... etc.

See Using Local Files with DuckDB for more details on working with local files.

Connection Method Comparison

MethodSetupSpeedQuery LengthBest For
LibPQPkg.add("LibPQ") onlySlower for large resultsNo limitGeneral WRDS access
ODBCRequires driver installationFast for large DataFramesMay have limitsBulk data downloads
DuckDBPkg.add("DuckDB") onlyVery fast (local I/O)No limitLocal data / testing

LibPQ requires no setup beyond installation. ODBC is considerably faster at converting large result sets to DataFrames (e.g., downloading the full CRSP daily stockfile takes ~4 minutes with ODBC vs ~24 minutes with LibPQ on a gigabit connection), but requires an ODBC driver to be installed separately. ODBC also stores your password in the driver settings, making it easier to share a project without exposing credentials. DuckDB is only for local data (Parquet, CSV, or DuckDB database files) and cannot connect to the WRDS Postgres server.