Morpho 1¶
Introduction¶
Morpho is a python interface to the Stan/PyStan Markov Chain Monte Carlo package.
Morpho is intended as a meta-analysis tool to fit or generate data, organize inflow and outflow of data and models.
For more information, also see:
Stan: http://mc-stan.org
Install¶
### Dependencies ###
- The following dependencies should be installed (via a package manager) before installing morpho:
- python (2.7.x; 3.x not supported)
- python-pip
- git
- python-matplotlib
Morpho reads and saves files in either R or ROOT. If you would like to use root, install root-system or see https://root.cern (and ensure that the same version of python is enabled for morpho and ROOT).
### Virtual environment-based installation ###
We recommend installing morpho using pip inside a python virtual environment. Doing so will automatically install dependencies beyond the four listed above, including PyStan 2.15.
If necessary, install [virtualenv](https://virtualenv.pypa.io/en/stable/), then execute: ```bash
virtualenv ~/path/to/the/virtualenvironment source ~/path/to/the/virtualenvironment/bin/activate #Activate the environment #Use “bash deactivate” to exit the environment pip install -U pip #Update pip to >= 7.0.0 cd ~/path/to/morpho pip install . pip install .[all]
### Docker installation ###
If you would like to modify your local installation of morpho (to add features or resolve any bugs), we recommend you use a [Docker container](https://docs.docker.com/get-started/) instead of a python virtual environment. To do so:
- Install Docker: https://docs.docker.com/engine/installation/.
- Clone and pull the latest master version of morpho.
3. Inside the morpho folder, execute
`docker-compose run morpho`
. A new terminal prompter (for example,`root@413ab10d7a8f:`
) should appear. You may make changes to morpho either inside or outside of the Docker container. If you wish to work outside of the container, move morpho to the`morpho_share`
directory that is mounted under the`/host`
folder created by docker-compose. 4. You can remove the container image using`docker rmi morpho_morpho`
.If you develop new features or identify bugs, please open a GitHub issue.
Running Morpho¶
Once the relevant data, model and configuration files are at your disposal, run morpho by executing: ```bash
morpho –config /path/to/json_or_yaml_config_file –other_options
You can test morpho using the example in the morpho_test directory: ```bash
morpho –config morpho_test/scripts/morpho_linear_fit.yaml
An Example File¶
The format allows the user to execute Stan using standarized scripts. Let us now take apart an example file to illustrate how morpho functions. You can find the example file in
morpho/examples/morpho_test/scripts/morpho_linear_fit.yaml
Let us start with the initiation portion of the configuration.
morpho:
do_preprocessing: False
do_stan: True
do_postprocessing: False
do_plots: True
Under the morpho block, you can select how the processors will be run. In this case, it will run the main Stan function and produce plots at the end of processing.
Next, we come to the main Stan configuration block, where both running conditions, data and parameters can be fed into the Stan model.
stan:
name: "morpho_test"
model:
file: "./morpho_test/models/morpho_linear_fit.stan"
function_file: None
cache: "./morpho_test/cache"
data:
files:
- name: "./morpho_test/data/input.data"
format: "R"
parameters:
- N: 30
run:
algorithm: "NUTS"
iter: 4000
warmup: 1000
chain: 12
n_jobs: 2
init:
- slope : 2.0
intercept : 1.0
sigma: 1.0
output:
name: "./morpho_test/results/morpho_linear_fit"
format: "root"
tree: "morpho_test"
inc_warmup: False
branches:
- variable: "slope"
root_alias: "a"
- variable: "intercept"
root_alias: "b"
The model block allows you to load in your Stan model file (for more on Stan models, see PyStan or Stan documentations). The compiled code can be cached to reduce running time. It is also possible to load in external functions located in separated files elsewhere.
The next block, the data block, reads in data. File formats include R and root. One can also load in parameters directly using the parameters block, as we do for the variable N.
The next block, the run block, allows one to control how Stan is run (number of chains, warmup, algorithms, etc.). Initializations can also be set here. This block feeds directly into PyStan.
The last block within the Stan block is the output. In this example, we save to a root file, and maintain two variables, a and b.
Since we specified the configure file to also make some plots, we can set up those conditions as well. In our example again, we have:
plot:
which_plot:
- method_name: histo
module_name: histo
title: "histo"
input_file_name : "./morpho_test/results/morpho_linear_fit.root"
input_tree: "morpho_test"
output_path: ./morpho_test/results/
data:
- a
The plot saves a PDF of the variable a based on the root file results.
The flow is thus as follows. Morpho is told to execute Stan and its plotting features. The Stan execution reads in external data and sets the running in much the same way as PyStan does. Results are then saved to the results folder (in this case, under root files). Plots are also executed to ensure the quality of results.
Preprocessing¶
Preprocessing functions are applied to data in advance of executing the fitter. Typically this is done to prepare the data in some state in advance of fitting.
Preprocessing can be set as a flag in the beginning of the configuration file. As an example
morpho:
do_preprocessing: true
Later in the configuration file, you can set up the commands to pre-process data
preprocessing:
which_pp:
- method_name: bootstrapping
module_name: resampling
input_file_name: ./my_spectrum.root
input_tree: input
output_file_name: ./my_fit_data.root
output_tree: bootstrapped_data
option: "RECREATE"
number_data: 5000
In the above example, it will randomly sample 5000 data points from the root file “my_spectrum.root” (with tree input) and save it to a new data file called “./my_fit_data.root” with tree name ” bootstrapped_data”.
Postprocessing¶
Postprocessing functions are applied to data after executing the fitter. Typically this is done examine the parameter information and check for convergence.
Postprocessing can be set as a flag in the beginning of the configuration file. As an example
morpho:
do_postprocessing: true
Later in the configuration file, you can set up the commands to post-process data. For example, to reduce the data into bins
preprocessing:
which_pp:
- method_name: general_data_reducer
module_name: general_data_reducer
input_file_name: ./my_spectrum.root
input_file_format: root
input_tree: spectrum
data:
-Kinetic_Energy
minX:
-18500.
maxX:
-18600.
nBinHisto:
-1000
output_file_name: ./my_binned_data.root
output_file_format: root
output_tree: bootstrapped_data
option: "RECREATE"
In the above example, it will take data from the root file saved in the Kinetic_Energy parameter and rebin it in a 1000-bin histogram.
Plots¶
Plotting is a useful set of routines to make quick plots and diagnostic tests, usualluy after the Stan main executable has been run.:
morpho:
do_plots: true
Later in the configuration file, you can set up the commands to plot data after the fitter is complete.
plot:
which_plot:
- method_name: histo
title: "histo"
input_file_name : "./morpho_test/results/morpho_linear_fit.root"
input_tree: "morpho_test"
output_path: ./morpho_test/results/
data:
- a
In the above example, it will take data from the root file saved in the a parameter plot and save it to ./morpho_test/results/histo_a.pdf
We have plotting schemes that cover a number of functions:
- Plotting contours, densities, and matricies (often to look for correlations).
- Time series to study convergences.
Example Script¶
The following are example yaml scripts for important Preprocessing, Postprocessing, and Plot routines in Morpho 1. The format of the yaml script for other methods can be obtained from the documentation for that method.
Preprocessing¶
“do_preprocessing : true” must be in the morpho dictionary. The dictionaries below should be placed in a “which_pp” dictionary inside the “preprocessing” dictionary.
bootstrapping¶
Resamples the contents of a tree. Instead of regenerating a fake data set on every sampler, one can generate a larger data set, then extract subsets.
- method_name: "boot_strapping"
module_name: "resampling"
input_file_name: "input.root" # Name of file to access
# Must be a root file
input_tree: "tree_name" # Name of tree to access
output_file_name: "output.root" # Name of the output file
# The default is the same the input_file_name
output_tree: "tree_name" # Tree output name
# Default is same as input.
number_data: int # Number of sub-samples the user wishes to extract.
option: "RECREATE" # Option for saving root file (default = RECREATE)
Postprocessing¶
“do_postprocessing : true” must be in the morpho dictionary. The dictionaries below should be placed in a “which_pp” dictionary inside the “postprocessing” dictionary.
general_data_reducer¶
Tranform a function defining a spectrum into a histogram of binned data points.
- method_name: "general_data_reducer"
module_name: "general_data_reducer"
input_file_name: "input.root" # Path to the root file that contains the raw data
input_file_format: "root" # Format of the input file
# Currently only root is supported
input_tree: "spectrum" # Name of the root tree containing data of interest
data: ["KE"] # Optional list of names of branches of the data to be binned
minX:[18500.] # Optional list of minimum x axis values of the data to be binned
maxX:[18600.] # Optional list of maximum x axis values of the data to be binned
nBinHisto:[50] # List of desired number of bins in each histogram
output_file_name: "out.root", # Path to the file where the binned data will be saved
output_file_format: "root", # Format of the output file
output_file_option: RECREATE # RECREATE will erase and recreate the output file
# UPDATE will open a file (after creating it, if it does not exist) and update the file.
Plot¶
“do_plots : true” must be in the morpho dictionary. The dictionaries below should be placed in a “which_plot” dictionary inside the “plot” dictionary.
contours¶
contours creates a matrix of contour plots using a stanfit object
- method_name: "contours"
module_name: "contours"
read_cache_name: "cache_name_file.txt" # File containing path to stan model cache
input_fit_name: "analysis_fit.pkl"# pickle file containing stan fit object
output_path: "./results/" # Directory to save results in
result_names: ["param1", "param2", "param3"] # Names of parameters to plot
output_format: "pdf"
spectra¶
Plot a 1D histogram using 2 lists of data giving an x point and the corresponding bin contents
- method_name: "spectra"
module_name: "histo"
title: "histo"
input_file_name : "input.root"
input_tree: "tree_name"
output_path: "output.root"
data:
- param_name
histo2D¶
Plot a 2D histogram using 2 lists of data
- method_name: "histo2D"
module_name: "histo"
input_file_name : "input.root"
input_tree: "tree_name"
root_plot_option: "contz"
data:
- list_x_branch
- list_y_branch
histo2D_divergence¶
Plot a 2D histogram with divergence indicated by point color
- method_name: "histo2D_divergence"
module_name: "histo"
input_file_name : "input.root"
input_tree: "tree_name"
root_plot_option: "contz"
data:
- list_x_branch
- list_y_branch
aposteriori_distribution¶
Plot a grid of 2D histograms
- method_name: "aposteriori_distribution"
module_name: "histo"
input_file_name : "input.root"
input_tree: "tree_name"
root_plot_option: "cont"
output_path: output.root
title: "aposteriori_plots"
output_format: pdf
output_width: 12000
output_height: 1100
data:
- param1
- param2
- param3
correlation_factors¶
Plot a grid of correlation factors
- method_name: "correlation_factors"
module_name: "histo"
input_file_name : "input.root"
input_tree: "tree_name"
root_plot_option: "cont"
output_path: output.root
title: "aposteriori_plots"
output_format: pdf
output_width: 12000
output_height: 1100
data:
- param1
- param2
- param3