Utilities
WRDSMerger.range_join
— Functionfunction range_join(
df1::DataFrame,
df2::DataFrame,
on,
conditions::Array{Conditions};
minimize=nothing,
join_conditions::Union{Array{Symbol}, Symbol}=:and,
validate::Tuple{Bool, Bool}=(false, false),
jointype::Symbol=:inner
)
Joins the dataframes based on a series of conditions, designed to work with ranges
Arguments
df1::DataFrame
: left DataFramedf2::DataFrame
: right DataFrameon
: either array of column names or matched pairsconditions::Array{Conditions}
: array ofConditions
, which specifies the function (<=, >, etc.), left and right columnsjointype::Symbol=:inner
: type of join, options are :inner, :outer, :left, and :rightminimize
: eithernothing
or an array of column names or matched pairs, minimization will take place in orderjoin_conditions::Union{Array{Symbol}, Symbol}
: defaults to:and
, otherwise an array of symbols that is 1 less than the length of conditions that the joins will happen in (:or or :and)validate::Tuple{Bool, Bool}
: Whether to validate, this works differently than the equivalent in DataFrames joins, here, validate insures that a single row from the dataframe is not duplicated. So validate=(true, false) means that there are no duplicated rows from the left dataframe in the result.
Example
df1 = DataFrame(
firm=1:10,
date=Date.(2013, 1:10, 1:10)
)
df2 = df1[:, :]
df2[!, :date_low] = df2.date .- Day(2)
df2[!, :date_high] = df2.date .+ Day(2)
select!(df2, Not(:date))
range_join(
df1,
df2,
[:firm],
[
Conditions(<, :date, :date_high),
Conditions(>, :date, :date_low)
],
join_conditions=[:and]
)
WRDSMerger.check_schema_perms
— Functioncheck_schema_perms(conn, library::String)
Verify that the user can access a schema
WRDSMerger.approx_row_count
— Functionapprox_row_count(conn, library::String, table::String)
Get an approximate count of the number of rows in a table
WRDSMerger.modify_col!
— Functionmodify_col!
tries to identify the real type of a column, especially for strings that are actually dates or floats that are actually integers. Almost all data downloaded from WRDS correctly specifies dates, but all numbers are stored as float8, even items like year
which should be an integer. This uses multiple dispatch to check if all elements in a given column are compatible with changing type and then changes the type of the column. Note that for strings this function does not by default try to convert integer like strings to integer (such as GVKey), it only converts strings that look like a date, datetime, or time.