Input Data
Given all of the hard work put into specifying the model, one should be able to maintain the input data painlessly. To that extent, DSGE.jl provides facilities to download appropriate vintages of data series from FRED (Federal Reserve Economic Data).
Note that a sample input dataset for use with model m990
is provided; see New York Fed Model 990 Data for more details. To update this sample dataset for use with model m990
, see Update sample input data.
Setup
To take advantage of the ability to automatically download data series from FRED via the FredData.jl package, set up your FRED API access by following the directions here.
Loading data
At the most basic, loading data looks like this:
m = Model990()
df = load_data(m)
By default, load_data
will look on the disk first to see if an appropriate vintage of data is already present. If data on disk are not present, or if the data are invalid for any reason, a fresh vintage will be downloaded from FRED and merged with the other data sources specified. See load_data
for more details.
The resulting DataFrame df
contains all the required data series for this model, fully transformed. The first row is given by the Setting date_presample_start
and the last row is given by date_mainsample_end
. The first n_presample_periods
rows of df
are the presample.
Driver functions including estimate
accept this df
as an argument and convert it into a Matrix
suitable for computations using df_to_matrix
, which sorts the data, ensures the full sample is present, discards the date column, and sorts the observable columns according to the observables
field of the model object.
Non-FRED data sources
Some data series may not be available from FRED or one may simply wish to use a different data source, for whatever reason. The data sources and series are specified in the input_series
field of an Observable
object (see ModelConstructors.jl). For each data source that is not :fred
, a well-formed CSV of the form <source>_<yymmdd>.csv
is expected in the directory indicated by inpath(m, "raw")
. For example, the following might be the contents of a data source for two series :series1
and :series2
:
date,series1,series2
1959-06-30,1.0,NaN
1959-09-30,1.1,0.5
# etc.
Note that quarters are represented by the date of the last day of the quarter and missing values are specified by NaN
.
Example
Let's consider an example dataset comprised of 10 macro series sourced from FRED and one survey-based series sourced from, say, the Philadelphia Fed's Survey of Professional Forecasters via Haver Analytics. The Observable
for that data series might look like this:
Observable(:obs_longcpi, [:ASAXC10__SPF], annualtoquarter, quartertoannual,
"Median 10Y CPI Expectations", "Median 10Y CPI Expectations")
If the data vintage specified for the model is 151127
(Nov. 27, 2015), then the following files are expected in inpath(m, "raw")
:
spf_151127.csv
fred_151127.csv
The FRED series will be downloaded and the fred_151127.csv
file will be automatically generated, but the spf_151127.csv
file must be manually compiled as shown above:
date,ASACX10
1991-12-31,4.0
# etc.
Now, suppose that we set the data vintage to 151222
, to incorporate the BEA's third estimate of GDP. The fred_151222.csv
file will be downloaded, but there are no updates to the SPF dataset during this period. Regardless, the file spf_151222.csv
must be present to match the data vintage. The solution in this case is to manually copy and rename the older SPF dataset. Although this is not an elegant approach, it is consistent with the concept of a vintage as the data available at a certain point in time –- in this example, it just so happens that the SPF data available on Nov. 27 and Dec. 22 are the same.
Incorporate population forecasts
Many variables enter the model in per-capita terms. To that extent, we use data on population levels to adjust aggregate variables into per-capita variables. Furthermore, we apply the Hodrick-Prescott filter ("H-P filter") to the population levels to smooth cyclical components.
The user will ultimately want to produce forecasts of key variables such as GDP and then represent these forecasts in standard terms. That is, one wants to report GDP forecasts in aggregate terms, which is standard, rather than per-capita terms. To do this, we either extrapolate from the last periods of population growth in the data, or use external population forecasts.
Note that if external population forecasts are provided, non-forecast procedures, such as model estimation, are also affected because the H-P filter smoothes back from the latest observation.
To incorporate population forecasts,
- Set the model setting
use_population_forecast
totrue
. - Provide a file
population_forecast_<yymmdd>.csv
toinpath(m, "raw")
. Population forecasts should be in levels, and represent the same series as given by thepopulation_mnemonic
setting (defaults to:CNP16OV
, or "Civilian Noninstitutional Population, Thousands"). If your population forecast is in growth rates, convert it to levels yourself. The first row of data should correspond to the last period of the main sample, such that growth rates can be computed. As many additional rows of forecasts as desired can be provided.
The file should look like this:
date,POPULATION
2015-12-31,250000
2016-03-31,251000
# etc.
Dataset creation implementation details
Let's quickly walk through the steps DSGE.jl takes to create a suitable dataset.
First, a user provides a detailed specification of the data series and transformations used for their model.
the user specifies
m.observables
; the keys of this dictionary name the series to be used in estimating the model.the user specifies
m.observable_mappings
; the keys of this dictionary name observed variables, and the values correspond to the observable object, which contains information about the forward and reverse transforms as well as the input data series from which the observable is constructed.For a given observable, an input series, e.g.
m.observable_mappings[:obs_gdp].input_series
, is an array of mnemonics to be accessed from the data source listed after the mnemonic (separated by the double underscore). Note that these mnemonics do not correspond to observables one-to-one, but rather are usually series in levels that will be further transformed.There are also both forward and reverse transforms for a given observable, e.g.
m.observable_mappings[:obs_gdp].fwd_transform
andm.observable_mappings[:obs_gdp].rev_transform
. The forward transform operates on a single argument,levels
, which is a DataFrame of the data in levels returned by the functionload_data_levels
. The reverse transform operates on a forward transformed series (which is in model units) transforming it into human-readable units, such as one quarter percent changes or per-capita adjustments. Both transforms return a DataArray for a single series. These functions could do nothing, or they could perform a more complex transformation. See Data Transforms and Utilities for more information about series-specific transformations.the user adjusts data-related settings, such as
data_vintage
,data_id
,dataroot
,date_presample_start
,date_zlb_start
,date_forecast_start
, anduse_population_forecast
. See Working with Settings for details.
Second, DSGE.jl attempts to construct the dataset given this setup through a call to load_data
. See load_data
for more details.
- Intermediate data in levels are loaded. See
load_data_levels
for more details. - Transformations are applied to the data in levels. See
transform_data
for more details. - The data are saved to disk. See
save_data
for more details.
Conditional data
The user can easily add conditional data for any observables. By "conditional data", we mean that, in reality, some data has not become available yet, but we believe that a certain number is a decent guess, so we want to forecast conditional on our guessed data. For example, suppose we are in 2019:Q4, in which case we have not observed 2019:Q4 GDP growth yet. However, we might have some idea of the number, so we want our forecasts to be conditional on that guess.
To load such data, the user needs to include a "cond" folder within the input data folder, i.e. this folder joinpath(get_setting(m, :input_data), "cond")
should exist. Within this folder, the user can create a csv file taking the form cond_cdid=<xx>_cdvt=<yymmdd>.csv
. The user should then make sure that the model object being used has the following settings
cond_id::Int64
: the conditional data's equivalent ofdata_id
and will be inserted after thecdid
. Note that the ID must be less than 100.cond_vintage::String
: the conditional data's equivalent ofdata_vintage
and will be inserted after thecdvt
.
The contents of cond_cdid=<xx>_cdvt=<yymmdd>.csv
should have columns for each raw data series that is then used to construct a given conditional observable. The first column should be date
for the quarters of the conditional horizon, and the following columns should be for the raw data series. For example, to obtain real GDP growth, we need to have a population forecast file with both CNP16OV and CE16OV, the forecasted value of nominal GDP (under pnemonic GDP), and the forecasted value of the GDP deflator (under pnemonic GDPDEF) since these series are all required to compute obs_gdp
, which is per-capita real GDP growth. For core inflation, we just need the index level for core PCE (under pnemonic PCEPILFE).
Note that the csv should have only conditional horizon data. If you have data for any historical quarters, then the DataFrame
with both historical and conditional data will not be created in REPL correctly. For example, if I am forecasting 2019:Q4 with a conditional forecast of 2019:Q4 values, then the data conditional csv should have only values for 2019:Q4 (and onward). No values for 2019:Q3 or before should be in the conditional data csv.
Finally, to specify which variables should have conditional observations, make sure to set
cond_full_names::Vector{Symbol}
: variables when running a "full" conditional forecast. For Model 1002, this means averages of the current quarter's daily financial data as well as nowcasts of real GDP growth and core PCE inflation.cond_semi_names::Vector{Symbol}
: variables when running a "semi" conditional forecast. For Model 1002, this means averages of the current quarter's daily financial data.
See the default settings for an example of how these cond_full_names
and cond_semi_names
are initialized.
Common pitfalls
Given the complexity of the data download, you may find that the dataset generated by load_data
is not exactly as you expect. It is a good idea to compare the observables.jl
file for your model with the one used by Model1002
, which uses all the features provided by the package for handling data. Be certain that any significant differences are intentional. Here are also some common pitfalls to look out for:
- Ensure that the
data_vintage
andcond_vintage
model settings are as you expect. (Try checkingdata_vintage(m)
andcond_vintage(m)
.) - Ensure that the
data_id
andcond_id
model settings are correct for the given model. - Ensure that the
date_forecast_start
model setting is as you expect, and that is not logically incompatible withdata_vintage
. - Ensure that the
date_conditional_end
model setting is as you expect, and that is not logically incompatible withcond_vintage
. - Double check the transformations specified in the
data_transforms
field of the model object. - Ensure that the keys of the
observables
anddata_transforms
fields of the model object match. - Check the input files for Non-FRED data sources. They should be in the directory indicated by
inpath(m, "raw")
, be named appropriately given the vintage of data expected, and be formatted appropriately. One may have to copy and rename files of non-FRED data sources to match the specified vintage, even if the contents of the files would be identical. - Look for any immediate issues in the final dataset saved (
data_dsid=<xx>_vint=<yymmdd>.csv
). If a data series in this file is allNaN
values, then likely a non-FRED data source was not provided correctly. - Ensure that the column names of the data CSV match the keys of the
observables
field of the model object. - You may receive a warning that an input data file "does not contain the entire date range specified". This means that observations are not provided for some periods in which the model requires data. This is perfectly okay if your data series starts after
date_presample_start
. - If you successfully created a data set but it is missing observations that you want to add, you may need to recreate the data set. By default,
load_data
checks if a data set with the correct vintage already exists. If it does, thenload_data
loads the saved data rather than recreate a data set from scratch. However, if the saved data set is missing observations, then you want to recreate it by callingload_data(m; try_disk = false)
. - If you have a column that is completely empty (all missing/NaN data), but you still want to load the data, then use the keyword
check_empty_columns = false
.
If you experience any problems using FredData.jl, ensure your API key is provided correctly and that there are no issues with your firewall, etc. Any issues with FredData.jl proper should be reported on that project's page.
Update sample input data
A sample dataset is provided for the 2015 Nov 27 vintage. To update this dataset:
Step 1. See Setup to setup automatic data pulls using FredData.jl.
Step 2. Specify the exact data vintage desired:
julia> m <= Setting(:data_vintage, "yymmdd")
Step 3. Create data files for the non-FRED data sources. For model m990
, the required data files include spf_<yymmdd>.csv
(with column ASACX10
), longrate_<yymmdd>.csv
(with column FYCCZA
), and fernald_<yymmdd>.csv
(with columns TFPJQ
and TFPKQ
). To include data on expected interest rates, the file ois_<yymmdd>.csv
is also required. To include data on population forecasts, the file population_forecst_<yymmdd>.csv
is also required (see Incorporate population forecasts. See New York Fed Model Input Data for details on the series used and links to data sources.
Step 4. Run load_data(m)
; series from FRED will be downloaded and merged with the series from non-FRED data sources that you have already created. See Common pitfalls for some potential issues.
Data Transforms and Utilities
DSGE.df_to_matrix
— Methoddf_to_matrix(m, df; cond_type = :none, in_sample = true)
Return df
, converted to matrix of floats, and discard date column. Also ensure that rows are sorted by date and columns by m.observables
, with the option to specify whether or not the out of sample rows are discarded. The output of this function is suitable for direct use in estimate
, posterior
, etc.
Keyword Arguments:
include_presample::Bool
: indicates whether or not there are presample periods.in_sample::Bool
: indicates whether or not to discard rows that are out of sample. Set this flag to false in
the case that you are calling filter_shocks! in the scenarios codebase.
DSGE.load_cond_data_levels
— Methodload_cond_data_levels(m::AbstractDSGEModel; verbose::Symbol=:low)
Check on disk in inpath(m, "cond")
for a conditional dataset (in levels) of the correct vintage and load it.
The following series are also loaded from inpath(m, "raw")
and either appended or merged into the conditional data:
- The last period of (unconditional) data in levels (
data_levels_<yymmdd>.csv
), used to calculate growth rates - The first period of forecasted population (
population_forecast_<yymmdd>.csv
), used for per-capita calculations
DSGE.load_data
— Methodload_data(m::AbstractDSGEModel; try_disk::Bool = true, verbose::Symbol = :low,
check_empty_columns::Bool = true, summary_statistics::Symbol = :low)
Create a DataFrame with all data series for this model, fully transformed.
First, check the disk to see if a valid dataset is already stored in inpath(m, "data")
. A dataset is valid if every series in m.observable_mappings
is present and the entire sample is contained (from date_presample_start
to date_mainsample_end
. If no valid dataset is already stored, the dataset will be recreated. This check can be eliminated by passing try_disk=false
.
If the dataset is to be recreated, in a preliminary stage, intermediate data series as specified in m.observable_mappings
are loaded in levels using load_data_levels
. See ?load_data_levels
for more details.
Then, the series in levels are transformed as specified in m.observable_mappings
. See ?transform_data
for more details.
If m.testing
is false, then the resulting DataFrame is saved to disk as data_<yymmdd>.csv
. The data are then returned to the caller.
The keyword check_empty_columns
throws an error whenever a column is completely empty in the loaded data set if it is set to true.
The keyword summary_statistics
prints out a variety of summary statistics on the loaded data. When set to :low, we print only the number of missing/NaNs for each data series. When set to :high, we also print means, standard deviations,
DSGE.load_data_levels
— Methodload_data_levels(m::AbstractDSGEModel; verbose::Symbol=:low)
Load data in levels by appealing to the data sources specified for the model. Data from FRED is loaded first, by default; then, merge other custom data sources.
Check on disk in inpath(m, "data")
datasets, of the correct vintage, corresponding to the ones required by the entries in m.observable_mappings
. Load the appropriate data series (specified in m.observable_mappings[key].input_series
) for each data source.
To accomodate growth rates and other similar transformations, more rows of data may be downloaded than otherwise specified by the date model settings. (By the end of the process, these rows will have been dropped.)
Data from FRED (i.e. the :fred
data source) are treated separately. These are downloaded using load_fred_data
. See ?load_fred_data
for more details.
Data from non-FRED data sources are read from disk, verified, and merged.
DSGE.parse_data_series
— Methodparse_data_series(m::AbstractDSGEModel)
Parse m.observable_mappings
for the data sources and mnemonics to read in.
Returns a Dict{Symbol, Vector{Symbol}}
mapping sources => mnemonics found in that data file.
DSGE.save_data
— Methodsave_data(m::AbstractDSGEModel, df::DataFrame; cond_type::Symbol = :none)
Save df
to disk as CSV. File is located in inpath(m, "data")
.
DSGE.load_fred_data
— Methodload_fred_data(m::AbstractDSGEModel; start_date="1959-03-31", end_date=prev_quarter())
Checks in inpath(m, raw)
for a FRED dataset corresponding to data_vintage(m)
. If a FRED vintage exists on disk, any required FRED series that is contained therein will be imported. All missing series will be downloaded directly from FRED using the FredData package. The full dataset is written to the appropriate data vintage file and returned.
Arguments
m::AbstractDSGEModel
: the model objectstart_date
: starting date.end_date
: ending date.
Notes
The FRED API reports observations according to the quarter-start date. load_fred_data
returns data indexed by quarter-end date for compatibility with other datasets.
DSGE.transform_data
— Methodtransform_data(m::AbstractDSGEModel, levels::DataFrame; cond_type::Symbol = :none,
verbose::Symbol = :low)
Transform data loaded in levels and order columns appropriately for the DSGE model. Returns DataFrame of transformed data.
The DataFrame levels
is output from load_data_levels
. The series in levels are transformed as specified in m.observable_mappings
.
- To prepare for per-capita transformations, population data are filtered using
hpfilter
. The series inlevels
to use as the population series is given by thepopulation_mnemonic
setting. Ifuse_population_forecast(m)
, a population forecast is appended to the recorded population levels before the filtering. Both filtered and unfiltered population levels and growth rates are added to thelevels
data frame. - The transformations are applied for each series using the
levels
DataFrame as input.
Conditional data (identified by cond_type in [:semi, :full]
) are handled slightly differently: If use_population_forecast(m)
, we drop the first period of the population forecast because we treat the first forecast period date_forecast_start(m)
as if it were data. We also only apply transformations for the observables given in cond_full_names(m)
or cond_semi_names(m)
.
DSGE.annualtoquarter
— Methodannualtoquarter(v)
Convert from annual to quarter frequency... by dividing by 4.
DSGE.difflog
— Methoddifflog(x::AbstractVector)
DSGE.difflog
— Methoddifflog(x::AbstractArray{AbstractFloat})
DSGE.hpfilter
— Methodyt, yf = hpfilter(y, λ)
Applies the Hodrick-Prescott filter ("H-P filter"). The smoothing parameter λ
is applied to the columns of y
, returning the trend component yt
and the cyclical component yf
. For quarterly data, one can use λ=1600.
Consecutive missing values at the beginning or end of the time series are excluded from the filtering. If there are missing values within the series, the filtered values are all missing.
See also:
Hodrick, Robert; Prescott, Edward C. (1997). "Postwar U.S. Business Cycles: An Empirical
Investigation". Journal of Money, Credit, and Banking 29 (1): 1–16.
DSGE.loggrowthtopct
— Methodloggrowthtopct(y)
Transform from annualized quarter-over-quarter log growth rates to annualized quarter-over-quarter percent change.
Note
This should only be used in Model 510, which has the core PCE inflation observable in annualized log growth rates.
DSGE.loggrowthtopct_4q_approx
— Functionloggrowthtopct_4q_approx(y, data = fill(NaN, 3))
Transform from log growth rates to approximate 4-quarter percent change.
This method should only be used to transform scenarios forecasts, which are in deviations from baseline.
Inputs
y
: the data we wish to transform to aggregate 4-quarter percent change from log per-capita growth rates.y
is either a vector of lengthnperiods
or anndraws x
nperiods` matrix.data
: ify = [y_t, y_{t+1}, ..., y_{t+nperiods-1}]
, thendata = [y_{t-3}, y_{t-2}, y_{t-1}]
. This is necessary to compute 4-quarter percent changes for the first three periods.
DSGE.loggrowthtopct_annualized
— Methodloggrowthtopct_annualized(y)
Transform from log growth rates to annualized quarter-over-quarter percent change.
DSGE.loggrowthtopct_annualized_percapita
— Methodloggrowthtopct_annualized_percapita(y, pop_growth)
Transform from log per-capita growth rates to annualized aggregate (not per-capita) quarter-over-quarter percent change.
Note
This should only be used for output, consumption, investment and GDP deflator (inflation).
Inputs
y
: the data we wish to transform to annualized percent change from quarter-over-quarter log growth rates.y
is either a vector of lengthnperiods
or anndraws x
nperiods` matrix.pop_growth::Vector
: the lengthnperiods
vector of log population growth rates.
DSGE.loggrowthtopct_percapita
— Methodloggrowthtopct_percapita(y, pop_growth)
Transform from annualized quarter-over-quarter log per-capita growth rates to annualized quarter-over-quarter aggregate percent change.
Note
This should only be used in Model 510, which has the output growth observable in annualized log per-capita growth rates.
Inputs
y
: the data we wish to transform to annualized percent change from annualized log growth rates.y
is either a vector of lengthnperiods
or anndraws x
nperiods` matrix.pop_growth::Vector
: the lengthnperiods
vector of log population growth rates.
DSGE.logleveltopct_4q_approx
— Functionlogleveltopct_4q_approx(y, data = fill(NaN, 4))
Transform from log levels to approximate 4-quarter percent change.
This method should only be used to transform scenarios forecasts, which are in deviations from baseline.
Inputs
y
: the data we wish to transform to 4-quarter percent change from log levels.y
is either a vector of lengthnperiods
or anndraws x
nperiods` matrix.data
: ify = [y_t, y_{t+1}, ..., y_{t+nperiods-1}]
, thendata = [y_{t-4}, y_{t-3}, y_{t-2}, y_{t-1}]
. This is necessary to compute 4-quarter percent changes for the first three periods.
DSGE.logleveltopct_annualized
— Functionlogleveltopct_annualized(y, y0 = NaN)
Transform from log levels to annualized quarter-over-quarter percent change.
Inputs
y
: the data we wish to transform to annualized quarter-over-quarter percent change from log levels.y
is either a vector of lengthnperiods
or anndraws x
nperiods` matrix.y0
: the last data point in the history (of state or observable) corresponding to they
variable. This is required to compute a percent change for the first period.
DSGE.logleveltopct_annualized_approx
— Functionlogleveltopct_annualized_approx(y, y0 = NaN)
Transform from log levels to approximate annualized quarter-over-quarter percent change.
This method should only be used to transform scenarios forecasts, which are in deviations from baseline.
Inputs
y
: the data we wish to transform to annualized quarter-over-quarter percent change from log levels.y
is either a vector of lengthnperiods
or anndraws x
nperiods` matrix.y0
: the last data point in the history (of state or observable) corresponding to they
variable. This is required to compute a percent change for the first period.
DSGE.logleveltopct_annualized_percapita
— Functionlogleveltopct_annualized_percapita(y, pop_growth, y0 = NaN)
Transform from per-capita log levels to annualized aggregate (not per-capita) quarter-over-quarter percent change.
Note
This is usually applied to labor supply (hours worked per hour), and probably shouldn't be used for any other observables.
Inputs
y
: the data we wish to transform to annualized aggregate quarter-over-quarter percent change from per-capita log levels.y
is either a vector of lengthnperiods
or anndraws x
nperiods` matrix.pop_growth::Vector
: the lengthnperiods
vector of log population growth rates.y0
: The last data point in the history (of state or observable) corresponding to they
variable. This is required to compute a percent change for the first period.
DSGE.nominal_to_real
— Methodnominal_to_real(col, df; deflator_mnemonic = :GDPDEF)
Converts nominal to real values using the specified deflator.
Arguments
col
: Symbol indicating which column ofdf
to transformdf
: DataFrame containining series for proper population measure andcol
Keyword arguments
deflator_mnemonic
: indicates which deflator to use to calculate real values. Default value is the FRED GDP Deflator mnemonic.
DSGE.oneqtrpctchange
— Methodoneqtrpctchange(y)
Calculates the quarter-to-quarter percentage change of a series.
DSGE.percapita
— Methodpercapita(m, col, df)
percapita(col, df, population_mnemonic)
Converts data column col
of DataFrame df
to a per-capita value.
The first method checks hpfilter_population(m)
. If true, then it divides by the filtered population series. Otherwise it divides by the result of parse_population_mnemonic(m)[1]
.
Arguments
col
:Symbol
indicating which column of data to transformdf
:DataFrame
containining series for proper population measure andcol
population_mnemonic
: a mnemonic found indf
for some population measure
DSGE.quartertoannual
— Methodquartertoannual(v)
Convert from quarter to annual frequency... by multiplying by 4.
DSGE.quartertoannualpercent
— Methodquartertoannualpercent(v)
Convert from quarter to annual frequency in percent... by multiplying by 400.
DSGE.get_data_filename
— Methodget_data_filename(m, cond_type)
Returns the data file for m
, which depends on data_vintage(m)
, and if cond_type in [:semi, :full]
, also on cond_vintage(m)
and cond_id(m)
.
DSGE.iterate_quarters
— Methoditerate_quarters(start::Date, quarters::Int)
Returns the date corresponding to start
+ quarters
quarters.
Inputs
start
: starting datequarters
: number of quarters to iterate forward or backward
DSGE.quartertodate
— Methodquartertodate(string::String)
Convert string
in the form "YYqX", "YYYYqX", or "YYYY-qX" to a Date of the end of the indicated quarter. "X" is in {1,2,3,4}
and the case of "q" is ignored.
DSGE.subtract_quarters
— Methodsubtract_quarters(t1::Date, t0::Date)
Compute the number of quarters between t1 and t0, including t0 and excluding t1.
DSGE.data_to_df
— Methoddata_to_df(m, data, start_date)
Create a DataFrame
out of the matrix data
, including a :date
column beginning in start_date
. Variable names and indices are obtained from m.observables
.
DSGE.has_saved_data
— Methodhas_saved_data(m::AbstractDSGEModel; cond_type::Symbol = :none)
Determine if there is a saved dataset on disk for the required vintage and conditional type.
DSGE.isvalid_data
— Methodisvalid_data(m::AbstractDSGEModel, df::DataFrame; cond_type::Symbol = :none,
check_empty_columns::Bool = true)
Return if dataset is valid for this model, ensuring that all observables are contained and that all quarters between the beginning of the presample and the end of the mainsample are contained. Also checks to make sure that expected interest rate data is available if n_mon_anticipated_shocks(m) > 0
.
DSGE.read_data
— Methodread_data(m::AbstractDSGEModel; cond_type::Symbol = :none)
Read CSV from disk as DataFrame. File is located in inpath(m, "data")
.
DSGE.read_population_data
— Methodread_population_data(m; verbose = :low)
read_population_data(filename; verbose = :low)
Read in population data stored in levels, either from inpath(m, "raw", "population_data_levels_[vint].csv"
) or filename
.
DSGE.read_population_forecast
— Methodread_population_forecast(m; verbose = :low)
read_population_forecast(filename, population_mnemonic, last_recorded_date; verbose = :low)
Read in population forecast in levels, either from inpath(m, "raw", "population_forecast_[vint].csv")
or filename
. If that file does not exist, return an empty DataFrame
.
DSGE.transform_population_data
— Methodtransform_population_data(population_data, population_forecast,
population_mnemonic; verbose = :low)
Load, HP-filter, and compute growth rates from population data in levels. Optionally do the same for forecasts.
Inputs
population_data
: pre-loaded DataFrame of historical population data containing the columns:date
andpopulation_mnemonic
. Assumes this is sorted by date.population_forecast
: pre-loadedDataFrame
of population forecast containing the columns:date
andpopulation_mnemonic
population_mnemonic
: column name for population series inpopulation_data
andpopulation_forecast
Keyword Arguments
verbose
: one of:none
,:low
, or:high
use_hpfilter
: whether to HP filter population data and forecast. SeeOutput
below.pad_forecast_start::Bool
: Whether you want to re-size
the populationforecast such that the first index is one quarter ahead of the last index of populationdata. Only set to false if you have manually constructed population_forecast to artificially start a quarter earlier, so as to avoid having an unnecessary missing first entry.
Output
Two dictionaries containing the following keys:
population_data_out
::filtered_population_recorded
: HP-filtered historical population series (levels):dlfiltered_population_recorded
: HP-filtered historical population series (growth rates):dlpopulation_recorded
: Non-filtered historical population series (growth rates)
population_forecast_out
::filtered_population_forecast
: HP-filtered population forecast series (levels):dlfiltered_population_forecast
: HP-filtered population forecast series (growth rates):dlpopulation_forecast
: Non-filtered historical population series (growth rates)
If population_forecast_file
is not provided, the r"forecast" fields will be empty. If use_hpfilter = false
, then the r"filtered*" fields will be empty.
DSGE.get_irf_transform
— Methodget_irf_transform(transform::Function)
Returns the IRF-specific transformation, which doesn't add back population growth (since IRFs are given in deviations).
DSGE.get_nopop_transform
— Methodget_nopop_transform(transform::Function)
Returns the corresponding transformation which doesn't add back population growth. Used for shock decompositions, deterministic trends, and IRFs, which are given in deviations.
DSGE.get_scenario_transform
— Methodget_scenario_transform(transform::Function)
Given a transformation used for usual forecasting, return the transformation used for scenarios, which are forecasted in deviations from baseline.
The 1Q deviation from baseline should really be calculated by 1Q transforming the forecasts (in levels) under the baseline (call this y_b
) and alternative scenario (y_s
), then subtracting baseline from alternative scenario (since most of our 1Q transformations are nonlinear). Let y_d = y_s - y_b
. Then, for example, the most correct loggrowthtopct_annualized
transformation is:
y_b_1q = 100*(exp(y_b/100)^4 - 1)
y_s_1q = 100*(exp(y_s/100)^4 - 1)
y_d_1q = y_b_1q - y_s_1q
Instead, we approximate this by transforming the deviation directly:
y_d_1q ≈ 4*(y_b - y_s)
DSGE.get_transform4q
— Methodget_transform4q(transform::Function)
Returns the 4-quarter transformation associated with the annualizing transformation.
DSGE.lag
— Methodseries_lag_n = lag(series, n)
Returns a particular data series lagged by n periods
DSGE.loggrowthtopct_4q
— Functionloggrowthtopct_4q(y, data = fill(NaN, 3))
Transform from log growth rates to 4-quarter percent change.
Inputs
y
: the data we wish to transform to aggregate 4-quarter percent change from log per-capita growth rates.y
is either a vector of lengthnperiods
or anndraws x
nperiods` matrix.data
: ify = [y_t, y_{t+1}, ..., y_{t+nperiods-1}]
, thendata = [y_{t-3}, y_{t-2}, y_{t-1}]
. This is necessary to compute 4-quarter percent changes for the first three periods.
DSGE.loggrowthtopct_4q_percapita
— Functionloggrowthtopct_4q_percapita(y, pop_growth, data = fill(NaN, 3))
Transform from log per-capita growth rates to aggregate 4-quarter percent change.
Note
This should only be used for output, consumption, investment, and GDP deflator (inflation).
Inputs
y
: the data we wish to transform to aggregate 4-quarter percent change from log per-capita growth rates.y
is either a vector of lengthnperiods
or anndraws x
nperiods` matrix.pop_growth::Vector
: the lengthnperiods
vector of log population growth rates.data
: ify = [y_t, y_{t+1}, ..., y_{t+nperiods-1}]
, thendata = [y_{t-3}, y_{t-2}, y_{t-1}]
. This is necessary to compute 4-quarter percent changes for the first three periods.
DSGE.logleveltopct_4q
— Functionlogleveltopct_4q(y, data = fill(NaN, 4))
Transform from log levels to 4-quarter percent change.
Inputs
y
: the data we wish to transform to 4-quarter percent change from log levels.y
is either a vector of lengthnperiods
or anndraws x
nperiods` matrix.data
: ify = [y_t, y_{t+1}, ..., y_{t+nperiods-1}]
, thendata = [y_{t-4}, y_{t-3}, y_{t-2}, y_{t-1}]
. This is necessary to compute 4-quarter percent changes for the first three periods.
DSGE.logleveltopct_4q_percapita
— Functionlogleveltopct_4q_percapita(y, pop_growth, data = fill(NaN, 4))
Transform from per-capita log levels to 4-quarter aggregate percent change.
Note
This is usually applied to labor supply (hours worked), and probably shouldn't be used for any other observables.
Inputs
y
: the data we wish to transform to 4-quarter aggregate percent change from per-capita log levels.y
is either a vector of lengthnperiods
or anndraws x
nperiods` matrix.pop_growth::Vector
: the lengthnperiods
vector of log population growth rates.data
: ify = [y_t, y_{t+1}, ..., y_{t+nperiods-1}]
, thendata = [y_{t-4}, y_{t-3}, y_{t-2}, y_{t-1}]
. This is necessary to compute 4-quarter percent changes for the first three periods.
DSGE.prepend_data
— Methodprepend_data(y, data)
Prepends data necessary for running 4q transformations.
Inputs:
y
:ndraws x t
array representing a timeseries for variabley
data
: vector representing a timeseries to prepend toy
DSGE.datetoquarter
— Methoddatetoquarter(date::Date)
Convert string
in the form "YYqX", "YYYYqX", or "YYYY-qX" to a Date of the end of the indicated quarter. "X" is in {1,2,3,4}
and the case of "q" is ignored.
Return an integer from the set {1,2,3,4}
, corresponding to one of the quarters in a year given a Date object.
DSGE.datetoymdvec
— Methoddatetoymdvec(dt)
converts a Date to a vector/matrix holding the year, month, and date.
DSGE.format_dates!
— Methodformat_dates!(col, df)
Change column col
of dates in df
from String to Date, and map any dates given in the interior of a quarter to the last day of the quarter.
DSGE.get_quarter_ends
— Methodget_quarter_ends(start_date::Date,end_date::Date)
Returns an Array of quarter end dates between start_date
and end_date
.
DSGE.missing2nan
— Methodmissing2nan(a::Array)
Convert all elements of Union{X, Missing.Missing} or Missing.Missing to type Float64.
DSGE.missing_cond_vars!
— Methodmissing_cond_vars!(m, df; cond_type = :none, check_empty_columns = true)
Make conditional period variables not in cond_semi_names(m)
or cond_full_names(m)
missing if necessary.
DSGE.na2nan!
— Methodna2nan!(df::Array)
Convert all NAs in an Array to NaNs.
DSGE.na2nan!
— Methodna2nan!(df::DataFrame)
Convert all NAs in a DataFrame to NaNs.
DSGE.next_quarter
— Functionnext_quarter(q::TimeType = now())
Returns Date identifying last day of the next quarter
DSGE.prev_quarter
— Functionprev_quarter(q::TimeType = now())
Returns Date identifying last day of the previous quarter
DSGE.quartertofloats
— Methodquartertofloats(dt)
converts a Date to a floating point number based on the quarter
DSGE.reconcile_column_names
— Methodreconcile_column_names(a::DataFrame, b::DataFrame)
adds columns of missings to a and b so that both have the same set of column names.
DSGE.vinttodate
— Methodfunction vinttodate(vint)
Return the string given by data_vintage(m), which is in the format YYYYMMDD, to a Date object.