------------------------------------------------------------ Random Forest Model Objects for Pulmonary Toxicity Risk Ass- essment Jeremy M. Gernand 15 April 2013 ------------------------------------------------------------ http://nanohub.org/resources/17539 ------------------------------------------------------------ This download contains MATLAB treebagger objects, random forest models, based on a meta-analysis of published pulmonary nanoparticle toxicity experiments. There are 5 individual model objects contained in a single MATLAB .mat file called "NanoToxRandomForestModels.mat" MATLAB 2010a is the version utilized to create these models. They have also been tested with MATLAB version 2012a. The designations, input parameters, range of inputs, and outputs are described below: All model outputs describe the predicted measurement taken in bronchoalveolar lavage (BAL) fluid from a rodent (rat or mouse) exposed to the nanoparticles by inhalation, intrtracheal instillation, or aspiration. The units of the model outputs are all in "fold of control" or the multiple of the measured response over that of the control group. All RF models are only valid within the ranges of specified input parameters. These models do not extrapolate. They will produce a prediction outside of their defined ranges, but the predicted value will be identical to that of the nearest boundary. ------------------------------------------------------------ Input descriptions follow this format: Input Description (units) [Minimum - Maximum] RF_CNT_PMN INPUTS AND VALID RANGES: Total Dose, mass (ug/kg) [0 - 6,291] Post Exposure, Recovery (days) [1 - 90] Median Diameter (nm) [1 - 49] Median Length (nm) [320 - 5,900] Dose Cobalt (ug/kg) [0 - 3,335] Aggregation, MMAD* (nm) [1,670 - 4,200] OUTPUTS: PMN (fold of control) -- the multiple change in PMN counts from the control group to the exposed group PMN is polymorphonuclear neutrophils. For carbon nano- tubes. RF_CNT_LDH INPUTS AND VALID RANGES: Total Dose, mass (ug/kg) [0 - 8,889] Post Exposure, Recovery (days) [1 - 90] Median Diameter (nm) [1 - 49] Median Length (nm) [320 - 5,900] Dose Cobalt (ug/kg) [0 - 25,000] Aggregation, MMAD* (nm) [1,670 - 4,200] OUTPUTS: LDH (fold of control) -- the multiple change in LDH concentration from the control group to the exp- osed group. PMN is lactate dehydrogenase. For carbon nanotubes. RF_TiO2_LDH INPUTS AND VALID RANGES: Total Dose, mass (ug/kg) [0 - 3.87E6] Post Exposure, Recovery (days) [0 - 2] Avg. Primary Particle Size (nm) [3.5 - 1,000] Aggregation, MMAD (nm) [18 - 1,400] Purity (%) [88 - 100] OUTPUTS: LDH (fold of control) -- the multiple change in LDH concentration from the control group to the exp- osed group. PMN is lactate dehydrogenase. For titanium dioxide nanoparticles. RF_TiO2_TP INPUTS AND VALID RANGES: Total Dose, mass (ug/kg) [0 - 3.87E6] Post Exposure, Recovery (days) [0 - 2] Avg. Primary Particle Size (nm) [3.5 - 1,000] Aggregation, MMAD (nm) [18 - 1,400] Purity (%) [88 - 100] OUTPUTS: Total Protein (fold of control) -- the multiple change in total protein concentration from the control group to the exposure group. For titanium dioxide nanoparticles. RF_MetOx_LDH INPUTS AND VALID RANGES: Total Dose, mass (ug/kg) [0 - 16,543] Post Exposure, Recovery (days) [1 - 90] Aggregation, MMAD (nm) [2,800 - 3,300] Purity (%) [90 - 100] Gibbs Free Energy (kJ/mol) [-856 - -321] Avg. Primary Particle Size (nm) [90 - 452] OUTPUTS: LDH (fold of control) -- the multiple change in LDH concentration from the control group to the exp- osed group. PMN is lactate dehydrogenase. For metal oxide nanoparticles including titanium dioxide, mag- nesium oxide, zinc oxide, and silicon dioxide. *MMAD is Mass Mode Aerodynamic Diameter ------------------------------------------------------------ To utilize these models, the MATLAB function "predict" comb- ined with a matrix of the input parameters should be imple- mented as follows (the order of x1, x2, ... inputs must exactly follow the number and order outlined above): >> y = predict(RF_CNT_PMN,[x1 x2 x3 x4 x5 x6]); To generate a prediction when any of the input variables is missing, use the "NaN" (MATLAB designation for 'Not a Num- ber') in place of the particular value. If NaN is used for all input parameters, the RF model object will return the overall mean response for all exposed groups. ------------------------------------------------------------ Further information on these models as well as results can be found at: Gernand J. and Casman E. "Selecting Nanoparticle Properties to Mitigate Risks to Workers and the Public – A Machine Learning Modeling Framework to Compare Pulmonary Toxicity Risks of Nanomaterials." Proc. of IMECE2013. No. 62687. ------------------------------------------------------------