<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
   <title>Statistical Computing Matters</title>
   <link rel="alternate" type="text/html" href="http://blogs.uit.tufts.edu/statisticalcomputingmatters/" />
   <link rel="self" type="application/atom+xml" href="http://blogs.uit.tufts.edu/statisticalcomputingmatters/atom.xml" />
   <id>tag:blogs.uit.tufts.edu,2009:/statisticalcomputingmatters//173</id>
   <updated>2009-11-20T19:48:20Z</updated>
   <subtitle>Suggestions and comments about obscure and useful software.</subtitle>
   <generator uri="http://www.sixapart.com/movabletype/">Movable Type 3.31</generator>

<entry>
   <title>ICPSR online web survey analysis tool</title>
   <link rel="alternate" type="text/html" href="http://blogs.uit.tufts.edu/statisticalcomputingmatters/#007414" />
   <id>tag:blogs.uit.tufts.edu,2009:/statisticalcomputingmatters//173.7414</id>
   
   <published>2009-11-20T19:30:12Z</published>
   <updated>2009-11-20T19:48:20Z</updated>
   
   <summary>Like many universities, Tufts has a subscription membership to the ICPSR. Over the years, a typical use case would be for a researcher to download data for local analysis. In this way one would have maximal access to raw data...</summary>
   <author>
      <name>Durwood Marshall</name>
      
   </author>
   
   
   <content type="html" xml:lang="en" xml:base="http://blogs.uit.tufts.edu/statisticalcomputingmatters/">
      <![CDATA[Like many universities, Tufts has a subscription membership to the <strong><a href="http://www.icpsr.umich.edu/icpsrweb/ICPSR/index.jsp">ICPSR</a></strong>.  Over the years, a typical use case would be for a researcher to download data for local analysis.  In this way one would have maximal access to raw data and  use of analysis tools of one's choice.  Recently, <strong>ICPSR</strong> has provided a web front end tool, Survey Documentation Analysis (<strong><a href="http://www.icpsr.umich.edu/icpsrweb/ICPSR/access/sda.jsp">SDA</a></strong>), for online analysis suitable for routine statistical queries.  Many datasets in the archive have been prepared for use with this tool.  <strong>SDA</strong> is available to any Tufts faculty, staff or students.  It would seem that use of this interface in support of some class teaching activities would lower the barrier to analysis that a more traditional approach would require.]]>
      
   </content>
</entry>
<entry>
   <title>Mixed Model framework for GeneArray analysis</title>
   <link rel="alternate" type="text/html" href="http://blogs.uit.tufts.edu/statisticalcomputingmatters/#007328" />
   <id>tag:blogs.uit.tufts.edu,2009:/statisticalcomputingmatters//173.7328</id>
   
   <published>2009-11-10T19:28:13Z</published>
   <updated>2009-11-10T20:04:23Z</updated>
   
   <summary>An excellent recent article caught my attention that solves in a unified manner a class of gene array analysis problems using the work horse methodology of statistical Mixed Models. The approach taken is presented for a two group time course...</summary>
   <author>
      <name>Durwood Marshall</name>
      
   </author>
   
   
   <content type="html" xml:lang="en" xml:base="http://blogs.uit.tufts.edu/statisticalcomputingmatters/">
      <![CDATA[An excellent recent <strong><a href="http://www.bepress.com/sagmb/vol8/iss1/art47/?sending=10787">article</a></strong> caught my attention that solves in a unified manner a class of gene array analysis problems using the work horse methodology of statistical Mixed Models.  The approach taken is presented for a two group time course experiment.  Extension to more complex experimental designs is discussed.   A comparison is made to several competing approaches, including a simulation study,  and the featured method is implemented in <strong>SAS</strong> software.  However, any statistical package supporting Spectral Decomposition and Mixed Models may be used.  ]]>
      
   </content>
</entry>
<entry>
   <title>Epi - Statistic tools for Epidemiology</title>
   <link rel="alternate" type="text/html" href="http://blogs.uit.tufts.edu/statisticalcomputingmatters/#007293" />
   <id>tag:blogs.uit.tufts.edu,2009:/statisticalcomputingmatters//173.7293</id>
   
   <published>2009-11-06T16:44:55Z</published>
   <updated>2009-11-06T17:00:56Z</updated>
   
   <summary>Epi has been around for a long time, starting back in the days of DOS! Over the years Epi has matured into a suite of tools(Pepi) that has the fields of public health and epidemiology as its focus. WINPEPI software...</summary>
   <author>
      <name>Durwood Marshall</name>
      
   </author>
   
   
   <content type="html" xml:lang="en" xml:base="http://blogs.uit.tufts.edu/statisticalcomputingmatters/">
      <![CDATA[<strong>Epi </strong>has been around for a long time, starting back in the days of DOS! Over the years Epi has matured  into a suite of tools(<strong>Pepi)</strong> that has the fields of public health and epidemiology as its focus. <strong><a href="http://www.brixtonhealth.com/">WINPEPI </a></strong>software for the Windows platform is free and offers a board mix of  software that general statistics users might consider as an alternative to more costly options.  <strong>Epi</strong> is like the Energizer Bunny, it is the  gift that keeps going and going....]]>
      
   </content>
</entry>
<entry>
   <title> MCA - a thing of the past? </title>
   <link rel="alternate" type="text/html" href="http://blogs.uit.tufts.edu/statisticalcomputingmatters/#007254" />
   <id>tag:blogs.uit.tufts.edu,2009:/statisticalcomputingmatters//173.7254</id>
   
   <published>2009-10-30T19:32:45Z</published>
   <updated>2009-10-30T20:03:31Z</updated>
   
   <summary>All pair-wise Multiple Comparisons(MCA) is a well known collection of procedures for the stochastic ordering of means; which is a common research task. Classical methods rely on the assumption that the null hypothesis is true. Modern alternatives can be found...</summary>
   <author>
      <name>Durwood Marshall</name>
      
   </author>
   
   
   <content type="html" xml:lang="en" xml:base="http://blogs.uit.tufts.edu/statisticalcomputingmatters/">
      <![CDATA[All pair-wise Multiple Comparisons(MCA) is a well known collection of procedures for the stochastic ordering of means; which is a common research task. Classical methods rely on the assumption that the null hypothesis is true. Modern alternatives can be found in the Bayesian Statistics paradigm which abandons the Type 1 error notion.  In particular, for problems that can be cast in the hierarchical modeling framework, a principled Bayesian approach relies on partial pooling and shrinkage.  Technical arguments supporting this approach have been around for some time.  An excellent working paper by <strong><a href="http://www.stat.columbia.edu/~gelman/research/unpublished/multiple2f.pdf">Andrew Gleman</a></strong> on the topic presents an overview, simulation results and examples demonstrating the benefits in an applied setting.  Suggestions on the use of <strong><a href="http://www.r-project.org">R</a></strong> and other software is mentioned for implementation.
]]>
      
   </content>
</entry>
<entry>
   <title>An improved Spatial Scan Statistic</title>
   <link rel="alternate" type="text/html" href="http://blogs.uit.tufts.edu/statisticalcomputingmatters/#007232" />
   <id>tag:blogs.uit.tufts.edu,2009:/statisticalcomputingmatters//173.7232</id>
   
   <published>2009-10-27T16:07:07Z</published>
   <updated>2009-10-27T16:33:05Z</updated>
   
   <summary>Spatial scan statistics have been an important class of tools for cluster detection in spatial data. These are often used in support of surveillance and detection activities in public health and other fields. A common limitation of popular spatial scan...</summary>
   <author>
      <name>Durwood Marshall</name>
      
   </author>
   
   
   <content type="html" xml:lang="en" xml:base="http://blogs.uit.tufts.edu/statisticalcomputingmatters/">
      <![CDATA[Spatial scan statistics have been an important class of tools for cluster detection in spatial data.  These are often used in support of surveillance and detection activities in public health and other fields.  A common limitation of popular spatial scan statistics is the lack of accommodation in the uncertainty of the measure of interest.  In a recent <strong><a href="http://www.amstat.org">JASA</a></strong> Sept. 2009 article, <strong>Weighted Normal Spatial Scan Statistic for Heterogeneous Population Data</strong>, the authors offer a solution that addresses  this problem in more generality.  Weights related to local variance measures or  proxies such as sample size can be created for use in a weighted likelihood approach.  Extensions to non gaussian probability models are addressed.  Some case studies and power simulations provided  suggest excellent performance.   Their solution has been implemented in the freely available software <strong><a href="http://www.Satscan.org">Satscan</a></strong>.]]>
      
   </content>
</entry>
<entry>
   <title>mixAK: New data clustering options</title>
   <link rel="alternate" type="text/html" href="http://blogs.uit.tufts.edu/statisticalcomputingmatters/#007019" />
   <id>tag:blogs.uit.tufts.edu,2009:/statisticalcomputingmatters//173.7019</id>
   
   <published>2009-10-06T20:15:55Z</published>
   <updated>2009-10-06T20:52:19Z</updated>
   
   <summary>Cluster Analysis(and other tools) are often deployed to investigate structure(clustering) in multidimensional data sets. One approach to model such data is the Gaussian mixture model. mixAK is a new R package for Bayesian estimation of multivariate normal mixtures allowing for...</summary>
   <author>
      <name>Durwood Marshall</name>
      
   </author>
   
   
   <content type="html" xml:lang="en" xml:base="http://blogs.uit.tufts.edu/statisticalcomputingmatters/">
      <![CDATA[Cluster Analysis(and other tools) are often deployed to investigate structure(clustering) in multidimensional data sets.  One approach to model such data is the Gaussian mixture model.  <strong>mixAK</strong> is a new <strong><a href="http://www.R-project.org">R</a></strong> package for Bayesian estimation of multivariate normal mixtures allowing for selection of the number of mixture components, density estimation and optionally allows for interval-censored multivariate data. Author Arnost Komarek's  journal article <strong>Computational Statistics and Data Analysis</strong>,  Volume 53, Issue 12, October 2009,  presents the underlying theory and application of the new approach using RJ-MCMC estimation.  The selection of the number of mixture components is aided by Deviance Information Criterion(<strong>DIC</strong>) and Penalized Expected Deviance(<strong>PED</strong>) measures.  ]]>
      
   </content>
</entry>
<entry>
   <title>Survey weights and new ANES suggestions</title>
   <link rel="alternate" type="text/html" href="http://blogs.uit.tufts.edu/statisticalcomputingmatters/#006806" />
   <id>tag:blogs.uit.tufts.edu,2009:/statisticalcomputingmatters//173.6806</id>
   
   <published>2009-09-17T19:47:09Z</published>
   <updated>2009-09-17T21:33:17Z</updated>
   
   <summary>Many large surveys are structured as complex sample designs that reflect various stratification considerations. Statistics calculated from such designs must be weighted to reflect the general population of interest. A clear discussion and set of recommendations by four prominent researchers...</summary>
   <author>
      <name>Durwood Marshall</name>
      
   </author>
   
   
   <content type="html" xml:lang="en" xml:base="http://blogs.uit.tufts.edu/statisticalcomputingmatters/">
      <![CDATA[Many large surveys are structured as complex sample designs that reflect various stratification considerations.  Statistics calculated from such designs must be weighted to reflect the general population of interest.  A clear discussion and set of recommendations by four prominent researchers for the calculation and implementation of weights using <strong><a href="http://www.electionstudies.org">ANES</a></strong> datasets can be found in the Sept. 2009 Technical Report, nes012427,  <strong>Computing Weights for American National Election Study Survey Data</strong>.  The <strong><a href="ftp://ftp.electionstudies.org/ftp/nes/bibliography/documents/nes012427.pdf">report</a></strong> can be found in the Reference Library section of the ANES
website.
 Single panel cross-sectional, two-wave panel and multi-wave panel recommendations are considered along with nonresponse and poststratification weighting. The generality of discussion applies to other large studies such as Census data, and similar surveys. 

]]>
      
   </content>
</entry>
<entry>
   <title>Areal and point source spatial data models</title>
   <link rel="alternate" type="text/html" href="http://blogs.uit.tufts.edu/statisticalcomputingmatters/#006762" />
   <id>tag:blogs.uit.tufts.edu,2009:/statisticalcomputingmatters//173.6762</id>
   
   <published>2009-09-09T16:29:22Z</published>
   <updated>2009-09-09T22:07:50Z</updated>
   
   <summary>Researchers using spatial data are often faced with a mix of data obtained from several levels of scale, aggregation and point reference data. Classical geospatial regressions do not deal with this mix very well, and standard ordinary regressions even worst....</summary>
   <author>
      <name>Durwood Marshall</name>
      
   </author>
   
   
   <content type="html" xml:lang="en" xml:base="http://blogs.uit.tufts.edu/statisticalcomputingmatters/">
      <![CDATA[Researchers using spatial data are often faced with a mix of data obtained from several levels of scale, aggregation and point reference data. Classical geospatial regressions do not deal with this mix very well, and standard ordinary regressions even worst.  A unified treatment is the topic of a recent article, "Reparameterized and Marginalized Posterior and Predictive Sampling for Complex Bayesian Geostatistical Models"    in Volume 18, Number 2 of <strong><a href="http://pubs.amstat.org/loi/jcgs">JCGS</a></strong>.  In short, the authors cleverly reparameterized and recast the problem so as to allow efficient MCMC samplers to address the Bayesian estimation task.  Their article's supplemental materials provide the <strong><a href="http://www.r-project.org">R</a></strong> and <strong><a href="http://mathstat.helsinki.fi/openbugs/Home.html">OpenBugs</a></strong> codes to address the efficient estimation tasks outlined. ]]>
      
   </content>
</entry>
<entry>
   <title>Spss resources</title>
   <link rel="alternate" type="text/html" href="http://blogs.uit.tufts.edu/statisticalcomputingmatters/#006755" />
   <id>tag:blogs.uit.tufts.edu,2009:/statisticalcomputingmatters//173.6755</id>
   
   <published>2009-09-08T20:49:10Z</published>
   <updated>2009-09-08T20:54:55Z</updated>
   
   <summary>Spss software has an extensive tutorial built into its product and most first time users will benefit from using it. Additional Spss resources can be found here....</summary>
   <author>
      <name>Durwood Marshall</name>
      
   </author>
   
   
   <content type="html" xml:lang="en" xml:base="http://blogs.uit.tufts.edu/statisticalcomputingmatters/">
      <![CDATA[<strong><a href="http://www.spss.com">Spss</a></strong> software has an extensive tutorial built into its product and most first time users will benefit from using it.   Additional Spss resources can be found <strong><a href="http://www.spsstools.net/spss.htm">here</a></strong>. ]]>
      
   </content>
</entry>
<entry>
   <title>R available on Tufts Linux Cluster</title>
   <link rel="alternate" type="text/html" href="http://blogs.uit.tufts.edu/statisticalcomputingmatters/#006741" />
   <id>tag:blogs.uit.tufts.edu,2009:/statisticalcomputingmatters//173.6741</id>
   
   <published>2009-09-03T20:00:54Z</published>
   <updated>2009-09-03T20:09:07Z</updated>
   
   <summary>Elsewhere on this Blog I mention various bits and pieces of R software. Now that the fall semester is upon us, we have added many new R BioInformatic packages to the baseline R installation on our research linux cluster. This...</summary>
   <author>
      <name>Durwood Marshall</name>
      
   </author>
   
   
   <content type="html" xml:lang="en" xml:base="http://blogs.uit.tufts.edu/statisticalcomputingmatters/">
      <![CDATA[Elsewhere on this Blog I mention various bits and pieces of <strong><a href="http://www.r-project.org">R</a> </strong>software.  Now that the fall semester is upon us, we have added many new R BioInformatic packages to the baseline R installation on our research linux <strong><a href="http://go.tufts.edu/cluster">cluster</a></strong>.  This option provides a scalable solution to those needing additional computing power.   ]]>
      
   </content>
</entry>
<entry>
   <title>Bayes Software...the next big effort</title>
   <link rel="alternate" type="text/html" href="http://blogs.uit.tufts.edu/statisticalcomputingmatters/#006504" />
   <id>tag:blogs.uit.tufts.edu,2009:/statisticalcomputingmatters//173.6504</id>
   
   <published>2009-07-14T16:51:59Z</published>
   <updated>2009-07-14T17:17:40Z</updated>
   
   <summary>Historically, Bayesian solutions were computed as needed in formal languages(Fortran, C,java,etc...) and later in high level solutions like Matlab,Gauss,SAS/IML and others. Then Winbugs came along and offered a higher level interface, similar to what Matlab did for linear algebra syntax...</summary>
   <author>
      <name>Durwood Marshall</name>
      
   </author>
   
   
   <content type="html" xml:lang="en" xml:base="http://blogs.uit.tufts.edu/statisticalcomputingmatters/">
      <![CDATA[Historically, Bayesian solutions were computed as needed in formal languages(Fortran, C,java,etc...) and later in high level solutions like Matlab,Gauss,SAS/IML  and others.  Then <a href="http://www.mrc-bsu.cam.ac.uk/bugs/"><strong>Winbugs</strong></a> came along and offered a higher level interface, similar to what Matlab did for linear algebra syntax and functionality, but closer in spirit to the notation used by Statisticans to depict multilevel probability based models.  While all of these still have their pros and cons, we find now an explosion of Bayesian solutions implemented in <a href="http://www.r-project.org"><strong>R</strong></a> with the benefit of object orientation.  If one takes a look at the "<strong>CRAN Task View: Bayesian Inference</strong>" page on the R site maintained by Jong Hee Park, one will find 60+ packages with numerous solutions to many standard statistical modeling problems.  Of the many listed, note the package <strong>BAS</strong> for Bayesian Model Averaging in linear models using stochastic or deterministic sampling without replacement from posterior distributions. Prior distributions on coefficients are from Zellner's g-prior or mixtures of g-priors corresponding to the Zellner-Siow Cauchy Priors or the Liang et al hyper-g priors.  The stochastic search capability allows for model specification searches that would not have been possible a few years ago with the ease that is now possible.  ]]>
      
   </content>
</entry>
<entry>
   <title>Inference for R and MS Excel</title>
   <link rel="alternate" type="text/html" href="http://blogs.uit.tufts.edu/statisticalcomputingmatters/#006301" />
   <id>tag:blogs.uit.tufts.edu,2009:/statisticalcomputingmatters//173.6301</id>
   
   <published>2009-05-21T20:26:35Z</published>
   <updated>2009-05-21T20:38:08Z</updated>
   
   <summary>The widespread availability of MicroSoft Excel has created a less than desirable environment for statistical computing. In my opinion the Excel statistics add-in leaves much to be desired relative to real statistics packages. One solution for extending the usefullness of...</summary>
   <author>
      <name>Durwood Marshall</name>
      
   </author>
   
   
   <content type="html" xml:lang="en" xml:base="http://blogs.uit.tufts.edu/statisticalcomputingmatters/">
      <![CDATA[The widespread availability of MicroSoft Excel has created a less than desirable environment for statistical computing.  In my opinion the Excel statistics add-in leaves much to be desired relative to real statistics packages.  One solution for extending the usefullness of Excel is to abandon the Excel stats package in favor of <strong><a href="http://www.InferenceforR.com">InferenceforR</a></strong>.  This product allow for the use of <strong><a href="http://www.r-project.org">R</a></strong> within Excel. See the following <strong><a href="http://screencasts.bluereference.com/Special/TurboChargeStockAnalysis/">screencast</a></strong> for a slick presentation.]]>
      
   </content>
</entry>
<entry>
   <title>Numerical routines for Java developement</title>
   <link rel="alternate" type="text/html" href="http://blogs.uit.tufts.edu/statisticalcomputingmatters/#006250" />
   <id>tag:blogs.uit.tufts.edu,2009:/statisticalcomputingmatters//173.6250</id>
   
   <published>2009-05-06T17:04:56Z</published>
   <updated>2009-05-06T17:11:17Z</updated>
   
   <summary>If you want to save time and improve accuracy of your programs, don&apos;t reinvent the wheel, consider using javanumerics. A large variety of statistical and mathematical classes are available. Note, not all options are free....</summary>
   <author>
      <name>Durwood Marshall</name>
      
   </author>
   
   
   <content type="html" xml:lang="en" xml:base="http://blogs.uit.tufts.edu/statisticalcomputingmatters/">
      <![CDATA[If you want to save time and improve accuracy of your programs, don't reinvent the wheel,  consider using <a href="http://math.nist.gov/javanumerics/">javanumerics</a>. A large variety of statistical and mathematical classes are available.  Note, not all options are free.  ]]>
      
   </content>
</entry>
<entry>
   <title>Specialized Statistical Resources</title>
   <link rel="alternate" type="text/html" href="http://blogs.uit.tufts.edu/statisticalcomputingmatters/#006191" />
   <id>tag:blogs.uit.tufts.edu,2009:/statisticalcomputingmatters//173.6191</id>
   
   <published>2009-04-29T15:55:08Z</published>
   <updated>2009-04-29T16:03:07Z</updated>
   
   <summary>Often researchers will need access to functionality that isn&apos;t found in commercial statistics packages. This problem varies quite a bit and is meet with specialized solutions by the statistical community. These solutions are often cutting edge, reflecting new statistical research....</summary>
   <author>
      <name>Durwood Marshall</name>
      
   </author>
   
   
   <content type="html" xml:lang="en" xml:base="http://blogs.uit.tufts.edu/statisticalcomputingmatters/">
      <![CDATA[Often researchers will need access to functionality that isn't found in commercial statistics packages. This problem varies quite a bit and is meet with specialized solutions by the statistical community. These solutions are often cutting edge, reflecting new statistical research. Most stats packages allow some form of macro/code authorship. This works to a point and often provides a just in time solution. Well known examples include Matlab's scripting language, <strong>SAS IML</strong>, <strong>GAUSS</strong>, <strong>Stata</strong>, <strong>Splus</strong> and <strong>R</strong>. Yet others will seek stand alone solutions in one form or another. These range from public domain C, C++, Fortran, and Java research subrountines to stand-alone programs with various user interfaces. The goal of this blog is to list references and short descriptions of various solutions that may offer additional insights into your research and the statistical methods, and maybe even save you some time. About a dozen or so topics some to mind and I hope to address them shortly. These posts are not intended as statistical guidance nor endorsement. Most problems are best addressed by the advice of an experienced practitioner in the relevant field.]]>
      
   </content>
</entry>
<entry>
   <title>Statistical Power calculations</title>
   <link rel="alternate" type="text/html" href="http://blogs.uit.tufts.edu/statisticalcomputingmatters/#006020" />
   <id>tag:blogs.uit.tufts.edu,2009:/statisticalcomputingmatters//173.6020</id>
   
   <published>2009-04-13T19:20:25Z</published>
   <updated>2009-04-13T20:06:57Z</updated>
   
   <summary>Statistical power calculations are often needed at various stages of planning for establishing sample sizes. Elsewhere on this Blog I mention PiFace as a power calculation tool. However SAS users may find the following three SAS macros of interest. UnifyPow...</summary>
   <author>
      <name>Durwood Marshall</name>
      
   </author>
   
   
   <content type="html" xml:lang="en" xml:base="http://blogs.uit.tufts.edu/statisticalcomputingmatters/">
      <![CDATA[Statistical power calculations are often needed at various stages of planning for establishing sample sizes.  
Elsewhere on this Blog I mention <strong>PiFace</strong> as a power calculation tool.  However <strong>SAS</strong> users may find the following three <strong>SAS</strong> macros of interest.   
<a href="http://www.bio.ri.ccf.org/power.html">UnifyPow</a> is an extensive collection of power calculators implemented in <strong>SAS</strong> as a Macro.  A  SAS proceedings <a href="http://www2.sas.com/proceedings/sugi22/STATS/PAPER287.PDF "><strong>paper</strong></a> about UnifyPow discusses its broad generality.   The second macro is <a href="http://www.math.yorku.ca/SCS/sasmac/rpower.html ">rpower</a>  and addresses the reprospective aspect of the issue.  The third macro, <a href="http://www.nesug.org/proceedings/nesug08/sa/sa12.pdf ">glimmixsamplesize</a>, is designed to use the generality of <strong>SAS's</strong> Proc Glimmix for generalized linear mixed models.   These macros provide a substantial increase in the number of settings that can be addressed for power calculations. ]]>
      
   </content>
</entry>

</feed>
