Utilities
Range Joins
range_join performs inequality-based joins between DataFrames, useful when matching on date ranges or other interval conditions. Conditions are specified with the Conditions struct.
WRDSMerger.Conditions — Type
Conditions(fun::Function, l::Union{Symbol,String}, r::Union{Symbol,String})
Conditions(l, fun, r)A condition for use with range_join. Specifies a comparison function fun (e.g., <=, >=) applied between column l from the left DataFrame and column r from the right DataFrame.
WRDSMerger.range_join — Function
function range_join(
df1::DataFrame,
df2::DataFrame,
on,
conditions::Array{Conditions};
minimize=nothing,
join_conditions::Union{Array{Symbol}, Symbol}=:and,
validate::Tuple{Bool, Bool}=(false, false),
jointype::Symbol=:inner
)Joins the dataframes based on a series of conditions, designed to work with ranges
Arguments
df1::DataFrame: left DataFramedf2::DataFrame: right DataFrameon: either array of column names or matched pairsconditions::Array{Conditions}: array ofConditions, which specifies the function (<=, >, etc.), left and right columnsjointype::Symbol=:inner: type of join, options are :inner, :outer, :left, and :rightminimize: eithernothingor an array of column names or matched pairs, minimization will take place in orderjoin_conditions::Union{Array{Symbol}, Symbol}: defaults to:and, otherwise an array of symbols that is 1 less than the length of conditions that the joins will happen in (:or or :and)validate::Tuple{Bool, Bool}: Whether to validate, this works differently than the equivalent in DataFrames joins, here, validate insures that a single row from the dataframe is not duplicated. So validate=(true, false) means that there are no duplicated rows from the left dataframe in the result.
Example
df1 = DataFrame(
firm=1:10,
date=Date.(2013, 1:10, 1:10)
)
df2 = df1[:, :]
df2[!, :date_low] = df2.date .- Day(2)
df2[!, :date_high] = df2.date .+ Day(2)
select!(df2, Not(:date))
range_join(
df1,
df2,
[:firm],
[
Conditions(<, :date, :date_high),
Conditions(>, :date, :date_low)
],
join_conditions=[:and]
)Internal Helpers
WRDSMerger.check_schema_perms — Function
check_schema_perms(conn, library::String)Verify that the user can access a schema
WRDSMerger.approx_row_count — Function
approx_row_count(conn, library::String, table::String)Get an approximate count of the number of rows in a table
WRDSMerger.modify_col! — Function
modify_col! tries to identify the real type of a column, especially for strings that are actually dates or floats that are actually integers. Almost all data downloaded from WRDS correctly specifies dates, but all numbers are stored as float8, even items like year which should be an integer. This uses multiple dispatch to check if all elements in a given column are compatible with changing type and then changes the type of the column. Note that for strings this function does not by default try to convert integer like strings to integer (such as GVKey), it only converts strings that look like a date, datetime, or time.