Utilities

WRDSMerger.range_joinFunction
function range_join(
    df1::DataFrame,
    df2::DataFrame,
    on,
    conditions::Array{Conditions};
    minimize=nothing,
    join_conditions::Union{Array{Symbol}, Symbol}=:and,
    validate::Tuple{Bool, Bool}=(false, false),
    jointype::Symbol=:inner
)

Joins the dataframes based on a series of conditions, designed to work with ranges

Arguments

  • df1::DataFrame: left DataFrame
  • df2::DataFrame: right DataFrame
  • on: either array of column names or matched pairs
  • conditions::Array{Conditions}: array of Conditions, which specifies the function (<=, >, etc.), left and right columns
  • jointype::Symbol=:inner: type of join, options are :inner, :outer, :left, and :right
  • minimize: either nothing or an array of column names or matched pairs, minimization will take place in order
  • join_conditions::Union{Array{Symbol}, Symbol}: defaults to :and, otherwise an array of symbols that is 1 less than the length of conditions that the joins will happen in (:or or :and)
  • validate::Tuple{Bool, Bool}: Whether to validate, this works differently than the equivalent in DataFrames joins, here, validate insures that a single row from the dataframe is not duplicated. So validate=(true, false) means that there are no duplicated rows from the left dataframe in the result.

Example

df1 = DataFrame(
    firm=1:10,
    date=Date.(2013, 1:10, 1:10)
)

df2 = df1[:, :]
df2[!, :date_low] = df2.date .- Day(2)
df2[!, :date_high] = df2.date .+ Day(2)
select!(df2, Not(:date))

range_join(
    df1,
    df2,
    [:firm],
    [
        Conditions(<, :date, :date_high),
        Conditions(>, :date, :date_low)
    ],
    join_conditions=[:and]
)
source
WRDSMerger.modify_col!Function

modify_col! tries to identify the real type of a column, especially for strings that are actually dates or floats that are actually integers. Almost all data downloaded from WRDS correctly specifies dates, but all numbers are stored as float8, even items like year which should be an integer. This uses multiple dispatch to check if all elements in a given column are compatible with changing type and then changes the type of the column. Note that for strings this function does not by default try to convert integer like strings to integer (such as GVKey), it only converts strings that look like a date, datetime, or time.

source