# Input Data

Given all of the hard work put into specifying the model, one should be able to maintain the input data painlessly. To that extent, DSGE.jl provides facilities to download appropriate vintages of data series from FRED (Federal Reserve Economic Data).

Note that a sample input dataset for use with model `m990`

is provided; see New York Fed Model 990 Data for more details. To update this sample dataset for use with model `m990`

, see Update sample input data.

## Setup

To take advantage of the ability to automatically download data series from FRED via the FredData.jl package, set up your FRED API access by following the directions here.

## Loading data

At the most basic, loading data looks like this:

```
m = Model990()
df = load_data(m)
```

By default, `load_data`

will look on the disk first to see if an appropriate vintage of data is already present. If data on disk are not present, or if the data are invalid for any reason, a fresh vintage will be downloaded from FRED and merged with the other data sources specified. See `load_data`

for more details.

The resulting DataFrame `df`

contains all the required data series for this model, fully transformed. The first row is given by the Setting `date_presample_start`

and the last row is given by `date_mainsample_end`

. The first `n_presample_periods`

rows of `df`

are the presample.

Driver functions including `estimate`

accept this `df`

as an argument and convert it into a `Matrix`

suitable for computations using `df_to_matrix`

, which sorts the data, ensures the full sample is present, discards the date column, and sorts the observable columns according to the `observables`

field of the model object.

## Non-FRED data sources

Some data series may not be available from FRED or one may simply wish to use a different data source, for whatever reason. The data sources and series are specified in the `input_series`

field of an `Observable`

object (see ModelConstructors.jl). For each data source that is *not* `:fred`

, a well-formed CSV of the form `<source>_<yymmdd>.csv`

is expected in the directory indicated by `inpath(m, "raw")`

. For example, the following might be the contents of a data source for two series `:series1`

and `:series2`

:

```
date,series1,series2
1959-06-30,1.0,NaN
1959-09-30,1.1,0.5
# etc.
```

Note that quarters are represented by the date of the *last* day of the quarter and missing values are specified by `NaN`

.

### Example

Let's consider an example dataset comprised of 10 macro series sourced from FRED and one survey-based series sourced from, say, the Philadelphia Fed's Survey of Professional Forecasters via Haver Analytics. The `Observable`

for that data series might look like this:

```
Observable(:obs_longcpi, [:ASAXC10__SPF], annualtoquarter, quartertoannual,
"Median 10Y CPI Expectations", "Median 10Y CPI Expectations")
```

If the data vintage specified for the model is `151127`

(Nov. 27, 2015), then the following files are expected in `inpath(m, "raw")`

:

```
spf_151127.csv
fred_151127.csv
```

The FRED series will be downloaded and the `fred_151127.csv`

file will be automatically generated, but the `spf_151127.csv`

file must be manually compiled as shown above:

```
date,ASACX10
1991-12-31,4.0
# etc.
```

Now, suppose that we set the data vintage to `151222`

, to incorporate the BEA's third estimate of GDP. The `fred_151222.csv`

file will be downloaded, but there are no updates to the SPF dataset during this period. Regardless, the file `spf_151222.csv`

must be present to match the data vintage. The solution in this case is to manually copy and rename the older SPF dataset. Although this is not an elegant approach, it is consistent with the concept of a vintage as the data available at a certain point in time –- in this example, it just so happens that the SPF data available on Nov. 27 and Dec. 22 are the same.

## Incorporate population forecasts

Many variables enter the model in per-capita terms. To that extent, we use data on population levels to adjust aggregate variables into per-capita variables. Furthermore, we apply the Hodrick-Prescott filter ("H-P filter") to the population levels to smooth cyclical components.

The user will ultimately want to produce forecasts of key variables such as GDP and then represent these forecasts in standard terms. That is, one wants to report GDP forecasts in aggregate terms, which is standard, rather than per-capita terms. To do this, we either extrapolate from the last periods of population growth in the data, or use external population forecasts.

Note that if external population forecasts are provided, non-forecast procedures, such as model estimation, are also affected because the H-P filter smoothes back from the latest observation.

To incorporate population forecasts,

- Set the model setting
`use_population_forecast`

to`true`

. - Provide a file
`population_forecast_<yymmdd>.csv`

to`inpath(m, "raw")`

. Population forecasts should be in levels, and represent the same series as given by the`population_mnemonic`

setting (defaults to`:CNP16OV`

, or "Civilian Noninstitutional Population, Thousands"). If your population forecast is in growth rates, convert it to levels yourself. The first row of data should correspond to the last period of the main sample, such that growth rates can be computed. As many additional rows of forecasts as desired can be provided.

The file should look like this:

```
date,POPULATION
2015-12-31,250000
2016-03-31,251000
# etc.
```

## Dataset creation implementation details

Let's quickly walk through the steps DSGE.jl takes to create a suitable dataset.

First, a user provides a detailed specification of the data series and transformations used for their model.

the user specifies

`m.observables`

; the keys of this dictionary name the series to be used in estimating the model.the user specifies

`m.observable_mappings`

; the keys of this dictionary name observed variables, and the values correspond to the observable object, which contains information about the forward and reverse transforms as well as the input data series from which the observable is constructed.For a given observable, an input series, e.g.

`m.observable_mappings[:obs_gdp].input_series`

, is an array of mnemonics to be accessed from the data source listed after the mnemonic (separated by the double underscore). Note that these mnemonics do not correspond to observables one-to-one, but rather are usually series in*levels*that will be further transformed.There are also both forward and reverse transforms for a given observable, e.g.

`m.observable_mappings[:obs_gdp].fwd_transform`

and`m.observable_mappings[:obs_gdp].rev_transform`

. The forward transform operates on a single argument,`levels`

, which is a DataFrame of the data in levels returned by the function`load_data_levels`

. The reverse transform operates on a forward transformed series (which is in model units) transforming it into human-readable units, such as one quarter percent changes or per-capita adjustments. Both transforms return a DataArray for a single series. These functions could do nothing, or they could perform a more complex transformation. See Data Transforms and Utilities for more information about series-specific transformations.the user adjusts data-related settings, such as

`data_vintage`

,`data_id`

,`dataroot`

,`date_presample_start`

,`date_zlb_start`

,`date_forecast_start`

, and`use_population_forecast`

.

Second, DSGE.jl attempts to construct the dataset given this setup through a call to `load_data`

. See `load_data`

for more details.

- Intermediate data in levels are loaded. See
`load_data_levels`

for more details. - Transformations are applied to the data in levels. See
`transform_data`

for more details. - The data are saved to disk. See
`save_data`

for more details.

## Common pitfalls

Given the complexity of the data download, you may find that the dataset generated by `load_data`

is not exactly as you expect. It is a good idea to compare the `observables.jl`

file for your model with the one used by `Model1002`

, which uses all the features provided by the package for handling data. Be certain that any significant differences are intentional. Here are also some common pitfalls to look out for:

- Ensure that the
`data_vintage`

model setting is as you expect. (Try checking`data_vintage(m)`

.) - Ensure that the
`data_id`

model setting is correct for the given model. - Ensure that the
`date_forecast_start`

model setting is as you expect, and that is not logically incompatible with`data_vintage`

. - Double check the transformations specified in the
`data_transforms`

field of the model object. - Ensure that the keys of the
`observables`

and`data_transforms`

fields of the model object match. - Check the input files for Non-FRED data sources. They should be in the directory indicated by
`inpath(m, "raw")`

, be named appropriately given the vintage of data expected, and be formatted appropriately. One may have to copy and rename files of non-FRED data sources to match the specified vintage, even if the contents of the files would be identical. - Look for any immediate issues in the final dataset saved (
`data_dsid=<xx>_vint=<yymmdd>.csv`

). If a data series in this file is all`NaN`

values, then likely a non-FRED data source was not provided correctly. - Ensure that the column names of the data CSV match the keys of the
`observables`

field of the model object. - You may receive a warning that an input data file "does not contain the entire date range specified". This means that observations are not provided for some periods in which the model requires data. This is perfectly okay if your data series starts after
`date_presample_start`

. - If you successfully created a data set but it is missing observations that you want to add, you may need to recreate the data set. By default,
`load_data`

checks if a data set with the correct vintage already exists. If it does, then`load_data`

loads the saved data rather than recreate a data set from scratch. However, if the saved data set is missing observations, then you want to recreate it by calling`load_data(m; try_disk = false)`

. - If you have a column that is completely empty (all missing/NaN data), but you still want to load the data, then use the keyword
`check_empty_columns = false`

.

If you experience any problems using FredData.jl, ensure your API key is provided correctly and that there are no issues with your firewall, etc. Any issues with FredData.jl proper should be reported on that project's page.

## Update sample input data

A sample dataset is provided for the 2015 Nov 27 vintage. To update this dataset:

**Step 1**. See Setup to setup automatic data pulls using FredData.jl.

**Step 2**. Specify the exact data vintage desired:

`julia> m <= Setting(:data_vintage, "yymmdd")`

**Step 3**. Create data files for the non-FRED data sources. For model `m990`

, the required data files include `spf_<yymmdd>.csv`

(with column `ASACX10`

), `longrate_<yymmdd>.csv`

(with column `FYCCZA`

), and `fernald_<yymmdd>.csv`

(with columns `TFPJQ`

and `TFPKQ`

). To include data on expected interest rates, the file `ois_<yymmdd>.csv`

is also required. To include data on population forecasts, the file `population_forecst_<yymmdd>.csv`

is also required (see Incorporate population forecasts. See New York Fed Model Input Data for details on the series used and links to data sources.

**Step 4**. Run `load_data(m)`

; series from FRED will be downloaded and merged with the series from non-FRED data sources that you have already created. See Common pitfalls for some potential issues.

## Data Transforms and Utilities

`DSGE.df_to_matrix`

— Method`df_to_matrix(m, df; cond_type = :none, in_sample = true)`

Return `df`

, converted to matrix of floats, and discard date column. Also ensure that rows are sorted by date and columns by `m.observables`

, with the option to specify whether or not the out of sample rows are discarded. The output of this function is suitable for direct use in `estimate`

, `posterior`

, etc.

**Keyword Arguments:**

`include_presample::Bool`

: indicates whether or not there are presample periods.`in_sample::Bool`

: indicates whether or not to discard rows that are out of sample. Set this flag to false in

the case that you are calling filter_shocks! in the scenarios codebase.

`DSGE.load_cond_data_levels`

— Method`load_cond_data_levels(m::AbstractDSGEModel; verbose::Symbol=:low)`

Check on disk in `inpath(m, "cond")`

for a conditional dataset (in levels) of the correct vintage and load it.

The following series are also loaded from `inpath(m, "raw")`

and either appended or merged into the conditional data:

- The last period of (unconditional) data in levels (
`data_levels_<yymmdd>.csv`

), used to calculate growth rates - The first period of forecasted population (
`population_forecast_<yymmdd>.csv`

), used for per-capita calculations

`DSGE.load_data`

— Method```
load_data(m::AbstractDSGEModel; try_disk::Bool = true, verbose::Symbol = :low,
check_empty_columns::Bool = true, summary_statistics::Symbol = :low)
```

Create a DataFrame with all data series for this model, fully transformed.

First, check the disk to see if a valid dataset is already stored in `inpath(m, "data")`

. A dataset is valid if every series in `m.observable_mappings`

is present and the entire sample is contained (from `date_presample_start`

to `date_mainsample_end`

. If no valid dataset is already stored, the dataset will be recreated. This check can be eliminated by passing `try_disk=false`

.

If the dataset is to be recreated, in a preliminary stage, intermediate data series as specified in `m.observable_mappings`

are loaded in levels using `load_data_levels`

. See `?load_data_levels`

for more details.

Then, the series in levels are transformed as specified in `m.observable_mappings`

. See `?transform_data`

for more details.

If `m.testing`

is false, then the resulting DataFrame is saved to disk as `data_<yymmdd>.csv`

. The data are then returned to the caller.

The keyword `check_empty_columns`

throws an error whenever a column is completely empty in the loaded data set if it is set to true.

The keyword `summary_statistics`

prints out a variety of summary statistics on the loaded data. When set to :low, we print only the number of missing/NaNs for each data series. When set to :high, we also print means, standard deviations,

`DSGE.load_data_levels`

— Method`load_data_levels(m::AbstractDSGEModel; verbose::Symbol=:low)`

Load data in levels by appealing to the data sources specified for the model. Data from FRED is loaded first, by default; then, merge other custom data sources.

Check on disk in `inpath(m, "data")`

datasets, of the correct vintage, corresponding to the ones required by the entries in `m.observable_mappings`

. Load the appropriate data series (specified in `m.observable_mappings[key].input_series`

) for each data source.

To accomodate growth rates and other similar transformations, more rows of data may be downloaded than otherwise specified by the date model settings. (By the end of the process, these rows will have been dropped.)

Data from FRED (i.e. the `:fred`

data source) are treated separately. These are downloaded using `load_fred_data`

. See `?load_fred_data`

for more details.

Data from non-FRED data sources are read from disk, verified, and merged.

`DSGE.parse_data_series`

— Method`parse_data_series(m::AbstractDSGEModel)`

Parse `m.observable_mappings`

for the data sources and mnemonics to read in.

Returns a `Dict{Symbol, Vector{Symbol}}`

mapping sources => mnemonics found in that data file.

`DSGE.save_data`

— Method`save_data(m::AbstractDSGEModel, df::DataFrame; cond_type::Symbol = :none)`

Save `df`

to disk as CSV. File is located in `inpath(m, "data")`

.

`DSGE.load_fred_data`

— Method`load_fred_data(m::AbstractDSGEModel; start_date="1959-03-31", end_date=prev_quarter())`

Checks in `inpath(m, raw)`

for a FRED dataset corresponding to `data_vintage(m)`

. If a FRED vintage exists on disk, any required FRED series that is contained therein will be imported. All missing series will be downloaded directly from FRED using the *FredData* package. The full dataset is written to the appropriate data vintage file and returned.

**Arguments**

`m::AbstractDSGEModel`

: the model object`start_date`

: starting date.`end_date`

: ending date.

**Notes**

The FRED API reports observations according to the quarter-start date. `load_fred_data`

returns data indexed by quarter-end date for compatibility with other datasets.

`DSGE.transform_data`

— Method```
transform_data(m::AbstractDSGEModel, levels::DataFrame; cond_type::Symbol = :none,
verbose::Symbol = :low)
```

Transform data loaded in levels and order columns appropriately for the DSGE model. Returns DataFrame of transformed data.

The DataFrame `levels`

is output from `load_data_levels`

. The series in levels are transformed as specified in `m.observable_mappings`

.

- To prepare for per-capita transformations, population data are filtered using
`hpfilter`

. The series in`levels`

to use as the population series is given by the`population_mnemonic`

setting. If`use_population_forecast(m)`

, a population forecast is appended to the recorded population levels before the filtering. Both filtered and unfiltered population levels and growth rates are added to the`levels`

data frame. - The transformations are applied for each series using the
`levels`

DataFrame as input.

Conditional data (identified by `cond_type in [:semi, :full]`

) are handled slightly differently: If `use_population_forecast(m)`

, we drop the first period of the population forecast because we treat the first forecast period `date_forecast_start(m)`

as if it were data. We also only apply transformations for the observables given in `cond_full_names(m)`

or `cond_semi_names(m)`

.

`DSGE.annualtoquarter`

— Method`annualtoquarter(v)`

Convert from annual to quarter frequency... by dividing by 4.

`DSGE.difflog`

— Method`difflog(x::AbstractVector)`

`DSGE.difflog`

— Method`difflog(x::AbstractArray{AbstractFloat})`

`DSGE.hpfilter`

— Method`yt, yf = hpfilter(y, λ)`

Applies the Hodrick-Prescott filter ("H-P filter"). The smoothing parameter `λ`

is applied to the columns of `y`

, returning the trend component `yt`

and the cyclical component `yf`

. For quarterly data, one can use λ=1600.

Consecutive missing values at the beginning or end of the time series are excluded from the filtering. If there are missing values within the series, the filtered values are all missing.

See also:

```
Hodrick, Robert; Prescott, Edward C. (1997). "Postwar U.S. Business Cycles: An Empirical
Investigation". Journal of Money, Credit, and Banking 29 (1): 1–16.
```

`DSGE.loggrowthtopct`

— Method`loggrowthtopct(y)`

Transform from annualized quarter-over-quarter log growth rates to annualized quarter-over-quarter percent change.

**Note**

This should only be used in Model 510, which has the core PCE inflation observable in annualized log growth rates.

`DSGE.loggrowthtopct_4q_approx`

— Function`loggrowthtopct_4q_approx(y, data = fill(NaN, 3))`

Transform from log growth rates to *approximate* 4-quarter percent change.

**This method should only be used to transform scenarios forecasts, which are in deviations from baseline.**

**Inputs**

`y`

: the data we wish to transform to aggregate 4-quarter percent change from log per-capita growth rates.`y`

is either a vector of length`nperiods`

or an`ndraws x`

nperiods` matrix.`data`

: if`y = [y_t, y_{t+1}, ..., y_{t+nperiods-1}]`

, then`data = [y_{t-3}, y_{t-2}, y_{t-1}]`

. This is necessary to compute 4-quarter percent changes for the first three periods.

`DSGE.loggrowthtopct_annualized`

— Method`loggrowthtopct_annualized(y)`

Transform from log growth rates to annualized quarter-over-quarter percent change.

`DSGE.loggrowthtopct_annualized_percapita`

— Method`loggrowthtopct_annualized_percapita(y, pop_growth)`

Transform from log per-capita growth rates to annualized aggregate (not per-capita) quarter-over-quarter percent change.

**Note**

This should only be used for output, consumption, investment and GDP deflator (inflation).

**Inputs**

`y`

: the data we wish to transform to annualized percent change from quarter-over-quarter log growth rates.`y`

is either a vector of length`nperiods`

or an`ndraws x`

nperiods` matrix.`pop_growth::Vector`

: the length`nperiods`

vector of log population growth rates.

`DSGE.loggrowthtopct_percapita`

— Method`loggrowthtopct_percapita(y, pop_growth)`

Transform from annualized quarter-over-quarter log per-capita growth rates to annualized quarter-over-quarter aggregate percent change.

**Note**

This should only be used in Model 510, which has the output growth observable in annualized log per-capita growth rates.

**Inputs**

`y`

: the data we wish to transform to annualized percent change from annualized log growth rates.`y`

is either a vector of length`nperiods`

or an`ndraws x`

nperiods` matrix.`pop_growth::Vector`

: the length`nperiods`

vector of log population growth rates.

`DSGE.logleveltopct_4q_approx`

— Function`logleveltopct_4q_approx(y, data = fill(NaN, 4))`

Transform from log levels to *approximate* 4-quarter percent change.

**This method should only be used to transform scenarios forecasts, which are in deviations from baseline.**

**Inputs**

`y`

: the data we wish to transform to 4-quarter percent change from log levels.`y`

is either a vector of length`nperiods`

or an`ndraws x`

nperiods` matrix.`data`

: if`y = [y_t, y_{t+1}, ..., y_{t+nperiods-1}]`

, then`data = [y_{t-4}, y_{t-3}, y_{t-2}, y_{t-1}]`

. This is necessary to compute 4-quarter percent changes for the first three periods.

`DSGE.logleveltopct_annualized`

— Function`logleveltopct_annualized(y, y0 = NaN)`

Transform from log levels to annualized quarter-over-quarter percent change.

**Inputs**

`y`

: the data we wish to transform to annualized quarter-over-quarter percent change from log levels.`y`

is either a vector of length`nperiods`

or an`ndraws x`

nperiods` matrix.`y0`

: the last data point in the history (of state or observable) corresponding to the`y`

variable. This is required to compute a percent change for the first period.

`DSGE.logleveltopct_annualized_approx`

— Function`logleveltopct_annualized_approx(y, y0 = NaN)`

Transform from log levels to *approximate* annualized quarter-over-quarter percent change.

**This method should only be used to transform scenarios forecasts, which are in deviations from baseline.**

**Inputs**

`y`

: the data we wish to transform to annualized quarter-over-quarter percent change from log levels.`y`

is either a vector of length`nperiods`

or an`ndraws x`

nperiods` matrix.`y0`

: the last data point in the history (of state or observable) corresponding to the`y`

variable. This is required to compute a percent change for the first period.

`DSGE.logleveltopct_annualized_percapita`

— Function`logleveltopct_annualized_percapita(y, pop_growth, y0 = NaN)`

Transform from per-capita log levels to annualized aggregate (not per-capita) quarter-over-quarter percent change.

**Note**

This is usually applied to labor supply (hours worked per hour), and probably shouldn't be used for any other observables.

**Inputs**

`y`

: the data we wish to transform to annualized aggregate quarter-over-quarter percent change from per-capita log levels.`y`

is either a vector of length`nperiods`

or an`ndraws x`

nperiods` matrix.`pop_growth::Vector`

: the length`nperiods`

vector of log population growth rates.`y0`

: The last data point in the history (of state or observable) corresponding to the`y`

variable. This is required to compute a percent change for the first period.

`DSGE.nominal_to_real`

— Method`nominal_to_real(col, df; deflator_mnemonic = :GDPDEF)`

Converts nominal to real values using the specified deflator.

**Arguments**

`col`

: Symbol indicating which column of`df`

to transform`df`

: DataFrame containining series for proper population measure and`col`

**Keyword arguments**

`deflator_mnemonic`

: indicates which deflator to use to calculate real values. Default value is the FRED GDP Deflator mnemonic.

`DSGE.oneqtrpctchange`

— Method`oneqtrpctchange(y)`

Calculates the quarter-to-quarter percentage change of a series.

`DSGE.percapita`

— Method```
percapita(m, col, df)
percapita(col, df, population_mnemonic)
```

Converts data column `col`

of DataFrame `df`

to a per-capita value.

The first method checks `hpfilter_population(m)`

. If true, then it divides by the filtered population series. Otherwise it divides by the result of `parse_population_mnemonic(m)[1]`

.

**Arguments**

`col`

:`Symbol`

indicating which column of data to transform`df`

:`DataFrame`

containining series for proper population measure and`col`

`population_mnemonic`

: a mnemonic found in`df`

for some population measure

`DSGE.quartertoannual`

— Method`quartertoannual(v)`

Convert from quarter to annual frequency... by multiplying by 4.

`DSGE.quartertoannualpercent`

— Method`quartertoannualpercent(v)`

Convert from quarter to annual frequency in percent... by multiplying by 400.

`DSGE.get_data_filename`

— Method`get_data_filename(m, cond_type)`

Returns the data file for `m`

, which depends on `data_vintage(m)`

, and if `cond_type in [:semi, :full]`

, also on `cond_vintage(m)`

and `cond_id(m)`

.

`DSGE.iterate_quarters`

— Method`iterate_quarters(start::Date, quarters::Int)`

Returns the date corresponding to `start`

+ `quarters`

quarters.

**Inputs**

`start`

: starting date`quarters`

: number of quarters to iterate forward or backward

`DSGE.quartertodate`

— Method`quartertodate(string::String)`

Convert `string`

in the form "YYqX", "YYYYqX", or "YYYY-qX" to a Date of the end of the indicated quarter. "X" is in `{1,2,3,4}`

and the case of "q" is ignored.

`DSGE.subtract_quarters`

— Method`subtract_quarters(t1::Date, t0::Date)`

Compute the number of quarters between t1 and t0, including t0 and excluding t1.

`DSGE.data_to_df`

— Method`data_to_df(m, data, start_date)`

Create a `DataFrame`

out of the matrix `data`

, including a `:date`

column beginning in `start_date`

. Variable names and indices are obtained from `m.observables`

.

`DSGE.has_saved_data`

— Method`has_saved_data(m::AbstractDSGEModel; cond_type::Symbol = :none)`

Determine if there is a saved dataset on disk for the required vintage and conditional type.

`DSGE.isvalid_data`

— Method```
isvalid_data(m::AbstractDSGEModel, df::DataFrame; cond_type::Symbol = :none,
check_empty_columns::Bool = true)
```

Return if dataset is valid for this model, ensuring that all observables are contained and that all quarters between the beginning of the presample and the end of the mainsample are contained. Also checks to make sure that expected interest rate data is available if `n_mon_anticipated_shocks(m) > 0`

.

`DSGE.read_data`

— Method`read_data(m::AbstractDSGEModel; cond_type::Symbol = :none)`

Read CSV from disk as DataFrame. File is located in `inpath(m, "data")`

.

`DSGE.read_population_data`

— Method```
read_population_data(m; verbose = :low)
read_population_data(filename; verbose = :low)
```

Read in population data stored in levels, either from `inpath(m, "raw", "population_data_levels_[vint].csv"`

) or `filename`

.

`DSGE.read_population_forecast`

— Method```
read_population_forecast(m; verbose = :low)
read_population_forecast(filename, population_mnemonic, last_recorded_date; verbose = :low)
```

Read in population forecast in levels, either from `inpath(m, "raw", "population_forecast_[vint].csv")`

or `filename`

. If that file does not exist, return an empty `DataFrame`

.

`DSGE.transform_population_data`

— Method```
transform_population_data(population_data, population_forecast,
population_mnemonic; verbose = :low)
```

Load, HP-filter, and compute growth rates from population data in levels. Optionally do the same for forecasts.

**Inputs**

`population_data`

: pre-loaded DataFrame of historical population data containing the columns`:date`

and`population_mnemonic`

. Assumes this is sorted by date.`population_forecast`

: pre-loaded`DataFrame`

of population forecast containing the columns`:date`

and`population_mnemonic`

`population_mnemonic`

: column name for population series in`population_data`

and`population_forecast`

**Keyword Arguments**

`verbose`

: one of`:none`

,`:low`

, or`:high`

`use_hpfilter`

: whether to HP filter population data and forecast. See`Output`

below.`pad_forecast_start::Bool`

: Whether you want to re-size

the population*forecast such that the first index is one quarter ahead of the last index of population*data. Only set to false if you have manually constructed population_forecast to artificially start a quarter earlier, so as to avoid having an unnecessary missing first entry.

**Output**

Two dictionaries containing the following keys:

`population_data_out`

:`:filtered_population_recorded`

: HP-filtered historical population series (levels)`:dlfiltered_population_recorded`

: HP-filtered historical population series (growth rates)`:dlpopulation_recorded`

: Non-filtered historical population series (growth rates)

`population_forecast_out`

:`:filtered_population_forecast`

: HP-filtered population forecast series (levels)`:dlfiltered_population_forecast`

: HP-filtered population forecast series (growth rates)`:dlpopulation_forecast`

: Non-filtered historical population series (growth rates)

If `population_forecast_file`

is not provided, the r"*forecast" fields will be empty. If use_hpfilter = false, then the r"*filtered*" fields will be empty.

`DSGE.get_irf_transform`

— Method`get_irf_transform(transform::Function)`

Returns the IRF-specific transformation, which doesn't add back population growth (since IRFs are given in deviations).

`DSGE.get_nopop_transform`

— Method`get_nopop_transform(transform::Function)`

Returns the corresponding transformation which doesn't add back population growth. Used for shock decompositions, deterministic trends, and IRFs, which are given in deviations.

`DSGE.get_scenario_transform`

— Method`get_scenario_transform(transform::Function)`

Given a transformation used for usual forecasting, return the transformation used for *scenarios*, which are forecasted in deviations from baseline.

The 1Q deviation from baseline should really be calculated by 1Q transforming the forecasts (in levels) under the baseline (call this `y_b`

) and alternative scenario (`y_s`

), then subtracting baseline from alternative scenario (since most of our 1Q transformations are nonlinear). Let `y_d = y_s - y_b`

. Then, for example, the most correct `loggrowthtopct_annualized`

transformation is:

```
y_b_1q = 100*(exp(y_b/100)^4 - 1)
y_s_1q = 100*(exp(y_s/100)^4 - 1)
y_d_1q = y_b_1q - y_s_1q
```

Instead, we approximate this by transforming the deviation directly:

`y_d_1q ≈ 4*(y_b - y_s)`

`DSGE.get_transform4q`

— Method`get_transform4q(transform::Function)`

Returns the 4-quarter transformation associated with the annualizing transformation.

`DSGE.lag`

— Method`series_lag_n = lag(series, n)`

Returns a particular data series lagged by n periods

`DSGE.loggrowthtopct_4q`

— Function`loggrowthtopct_4q(y, data = fill(NaN, 3))`

Transform from log growth rates to 4-quarter percent change.

**Inputs**

`y`

: the data we wish to transform to aggregate 4-quarter percent change from log per-capita growth rates.`y`

is either a vector of length`nperiods`

or an`ndraws x`

nperiods` matrix.`data`

: if`y = [y_t, y_{t+1}, ..., y_{t+nperiods-1}]`

, then`data = [y_{t-3}, y_{t-2}, y_{t-1}]`

. This is necessary to compute 4-quarter percent changes for the first three periods.

`DSGE.loggrowthtopct_4q_percapita`

— Function`loggrowthtopct_4q_percapita(y, pop_growth, data = fill(NaN, 3))`

Transform from log per-capita growth rates to aggregate 4-quarter percent change.

**Note**

This should only be used for output, consumption, investment, and GDP deflator (inflation).

**Inputs**

`y`

: the data we wish to transform to aggregate 4-quarter percent change from log per-capita growth rates.`y`

is either a vector of length`nperiods`

or an`ndraws x`

nperiods` matrix.`pop_growth::Vector`

: the length`nperiods`

vector of log population growth rates.`data`

: if`y = [y_t, y_{t+1}, ..., y_{t+nperiods-1}]`

, then`data = [y_{t-3}, y_{t-2}, y_{t-1}]`

. This is necessary to compute 4-quarter percent changes for the first three periods.

`DSGE.logleveltopct_4q`

— Function`logleveltopct_4q(y, data = fill(NaN, 4))`

Transform from log levels to 4-quarter percent change.

**Inputs**

`y`

: the data we wish to transform to 4-quarter percent change from log levels.`y`

is either a vector of length`nperiods`

or an`ndraws x`

nperiods` matrix.`data`

: if`y = [y_t, y_{t+1}, ..., y_{t+nperiods-1}]`

, then`data = [y_{t-4}, y_{t-3}, y_{t-2}, y_{t-1}]`

. This is necessary to compute 4-quarter percent changes for the first three periods.

`DSGE.logleveltopct_4q_percapita`

— Function`logleveltopct_4q_percapita(y, pop_growth, data = fill(NaN, 4))`

Transform from per-capita log levels to 4-quarter aggregate percent change.

**Note**

This is usually applied to labor supply (hours worked), and probably shouldn't be used for any other observables.

**Inputs**

`y`

: the data we wish to transform to 4-quarter aggregate percent change from per-capita log levels.`y`

is either a vector of length`nperiods`

or an`ndraws x`

nperiods` matrix.`pop_growth::Vector`

: the length`nperiods`

vector of log population growth rates.`data`

: if`y = [y_t, y_{t+1}, ..., y_{t+nperiods-1}]`

, then`data = [y_{t-4}, y_{t-3}, y_{t-2}, y_{t-1}]`

. This is necessary to compute 4-quarter percent changes for the first three periods.

`DSGE.prepend_data`

— Method`prepend_data(y, data)`

Prepends data necessary for running 4q transformations.

**Inputs:**

`y`

:`ndraws x t`

array representing a timeseries for variable`y`

`data`

: vector representing a timeseries to prepend to`y`

`DSGE.datetoquarter`

— Method`datetoquarter(date::Date)`

Convert `string`

in the form "YYqX", "YYYYqX", or "YYYY-qX" to a Date of the end of the indicated quarter. "X" is in `{1,2,3,4}`

and the case of "q" is ignored.

Return an integer from the set `{1,2,3,4}`

, corresponding to one of the quarters in a year given a Date object.

`DSGE.datetoymdvec`

— Method`datetoymdvec(dt)`

converts a Date to a vector/matrix holding the year, month, and date.

`DSGE.format_dates!`

— Method`format_dates!(col, df)`

Change column `col`

of dates in `df`

from String to Date, and map any dates given in the interior of a quarter to the last day of the quarter.

`DSGE.get_quarter_ends`

— Method`get_quarter_ends(start_date::Date,end_date::Date)`

Returns an Array of quarter end dates between `start_date`

and `end_date`

.

`DSGE.missing2nan`

— Method`missing2nan(a::Array)`

Convert all elements of Union{X, Missing.Missing} or Missing.Missing to type Float64.

`DSGE.missing_cond_vars!`

— Method`missing_cond_vars!(m, df; cond_type = :none, check_empty_columns = true)`

Make conditional period variables not in `cond_semi_names(m)`

or `cond_full_names(m)`

missing if necessary.

`DSGE.na2nan!`

— Method`na2nan!(df::Array)`

Convert all NAs in an Array to NaNs.

`DSGE.na2nan!`

— Method`na2nan!(df::DataFrame)`

Convert all NAs in a DataFrame to NaNs.

`DSGE.next_quarter`

— Function`next_quarter(q::TimeType = now())`

Returns Date identifying last day of the next quarter

`DSGE.prev_quarter`

— Function`prev_quarter(q::TimeType = now())`

Returns Date identifying last day of the previous quarter

`DSGE.quartertofloats`

— Method`quartertofloats(dt)`

converts a Date to a floating point number based on the quarter

`DSGE.reconcile_column_names`

— Method`reconcile_column_names(a::DataFrame, b::DataFrame)`

adds columns of missings to a and b so that both have the same set of column names.

`DSGE.vinttodate`

— Method`function vinttodate(vint)`

Return the string given by data_vintage(m), which is in the format YYYYMMDD, to a Date object.