<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0">
   <channel>
      <title>Statistical Computing Matters</title>
      <link>http://blogs.uit.tufts.edu/statisticalcomputingmatters/</link>
      <description>Suggestions and comments about obscure and useful software.</description>
      <language>en</language>
      <copyright>Copyright 2009</copyright>
      <lastBuildDate>Fri, 20 Nov 2009 14:30:12 -0500</lastBuildDate>
      <generator>http://www.sixapart.com/movabletype/</generator>
      <docs>http://blogs.law.harvard.edu/tech/rss</docs> 

            <item>
         <title>ICPSR online web survey analysis tool</title>
         <description><![CDATA[Like many universities, Tufts has a subscription membership to the <strong><a href="http://www.icpsr.umich.edu/icpsrweb/ICPSR/index.jsp">ICPSR</a></strong>.  Over the years, a typical use case would be for a researcher to download data for local analysis.  In this way one would have maximal access to raw data and  use of analysis tools of one's choice.  Recently, <strong>ICPSR</strong> has provided a web front end tool, Survey Documentation Analysis (<strong><a href="http://www.icpsr.umich.edu/icpsrweb/ICPSR/access/sda.jsp">SDA</a></strong>), for online analysis suitable for routine statistical queries.  Many datasets in the archive have been prepared for use with this tool.  <strong>SDA</strong> is available to any Tufts faculty, staff or students.  It would seem that use of this interface in support of some class teaching activities would lower the barrier to analysis that a more traditional approach would require.]]></description>
         <link>http://blogs.uit.tufts.edu/statisticalcomputingmatters/#007414</link>
         <guid>http://blogs.uit.tufts.edu/statisticalcomputingmatters/#007414</guid>
        
        
         <pubDate>Fri, 20 Nov 2009 14:30:12 -0500</pubDate>
      </item>
            <item>
         <title>Mixed Model framework for GeneArray analysis</title>
         <description><![CDATA[An excellent recent <strong><a href="http://www.bepress.com/sagmb/vol8/iss1/art47/?sending=10787">article</a></strong> caught my attention that solves in a unified manner a class of gene array analysis problems using the work horse methodology of statistical Mixed Models.  The approach taken is presented for a two group time course experiment.  Extension to more complex experimental designs is discussed.   A comparison is made to several competing approaches, including a simulation study,  and the featured method is implemented in <strong>SAS</strong> software.  However, any statistical package supporting Spectral Decomposition and Mixed Models may be used.  ]]></description>
         <link>http://blogs.uit.tufts.edu/statisticalcomputingmatters/#007328</link>
         <guid>http://blogs.uit.tufts.edu/statisticalcomputingmatters/#007328</guid>
        
        
         <pubDate>Tue, 10 Nov 2009 14:28:13 -0500</pubDate>
      </item>
            <item>
         <title>Epi - Statistic tools for Epidemiology</title>
         <description><![CDATA[<strong>Epi </strong>has been around for a long time, starting back in the days of DOS! Over the years Epi has matured  into a suite of tools(<strong>Pepi)</strong> that has the fields of public health and epidemiology as its focus. <strong><a href="http://www.brixtonhealth.com/">WINPEPI </a></strong>software for the Windows platform is free and offers a board mix of  software that general statistics users might consider as an alternative to more costly options.  <strong>Epi</strong> is like the Energizer Bunny, it is the  gift that keeps going and going....]]></description>
         <link>http://blogs.uit.tufts.edu/statisticalcomputingmatters/#007293</link>
         <guid>http://blogs.uit.tufts.edu/statisticalcomputingmatters/#007293</guid>
        
        
         <pubDate>Fri, 06 Nov 2009 11:44:55 -0500</pubDate>
      </item>
            <item>
         <title> MCA - a thing of the past? </title>
         <description><![CDATA[All pair-wise Multiple Comparisons(MCA) is a well known collection of procedures for the stochastic ordering of means; which is a common research task. Classical methods rely on the assumption that the null hypothesis is true. Modern alternatives can be found in the Bayesian Statistics paradigm which abandons the Type 1 error notion.  In particular, for problems that can be cast in the hierarchical modeling framework, a principled Bayesian approach relies on partial pooling and shrinkage.  Technical arguments supporting this approach have been around for some time.  An excellent working paper by <strong><a href="http://www.stat.columbia.edu/~gelman/research/unpublished/multiple2f.pdf">Andrew Gleman</a></strong> on the topic presents an overview, simulation results and examples demonstrating the benefits in an applied setting.  Suggestions on the use of <strong><a href="http://www.r-project.org">R</a></strong> and other software is mentioned for implementation.
]]></description>
         <link>http://blogs.uit.tufts.edu/statisticalcomputingmatters/#007254</link>
         <guid>http://blogs.uit.tufts.edu/statisticalcomputingmatters/#007254</guid>
        
        
         <pubDate>Fri, 30 Oct 2009 14:32:45 -0500</pubDate>
      </item>
            <item>
         <title>An improved Spatial Scan Statistic</title>
         <description><![CDATA[Spatial scan statistics have been an important class of tools for cluster detection in spatial data.  These are often used in support of surveillance and detection activities in public health and other fields.  A common limitation of popular spatial scan statistics is the lack of accommodation in the uncertainty of the measure of interest.  In a recent <strong><a href="http://www.amstat.org">JASA</a></strong> Sept. 2009 article, <strong>Weighted Normal Spatial Scan Statistic for Heterogeneous Population Data</strong>, the authors offer a solution that addresses  this problem in more generality.  Weights related to local variance measures or  proxies such as sample size can be created for use in a weighted likelihood approach.  Extensions to non gaussian probability models are addressed.  Some case studies and power simulations provided  suggest excellent performance.   Their solution has been implemented in the freely available software <strong><a href="http://www.Satscan.org">Satscan</a></strong>.]]></description>
         <link>http://blogs.uit.tufts.edu/statisticalcomputingmatters/#007232</link>
         <guid>http://blogs.uit.tufts.edu/statisticalcomputingmatters/#007232</guid>
        
        
         <pubDate>Tue, 27 Oct 2009 11:07:07 -0500</pubDate>
      </item>
            <item>
         <title>mixAK: New data clustering options</title>
         <description><![CDATA[Cluster Analysis(and other tools) are often deployed to investigate structure(clustering) in multidimensional data sets.  One approach to model such data is the Gaussian mixture model.  <strong>mixAK</strong> is a new <strong><a href="http://www.R-project.org">R</a></strong> package for Bayesian estimation of multivariate normal mixtures allowing for selection of the number of mixture components, density estimation and optionally allows for interval-censored multivariate data. Author Arnost Komarek's  journal article <strong>Computational Statistics and Data Analysis</strong>,  Volume 53, Issue 12, October 2009,  presents the underlying theory and application of the new approach using RJ-MCMC estimation.  The selection of the number of mixture components is aided by Deviance Information Criterion(<strong>DIC</strong>) and Penalized Expected Deviance(<strong>PED</strong>) measures.  ]]></description>
         <link>http://blogs.uit.tufts.edu/statisticalcomputingmatters/#007019</link>
         <guid>http://blogs.uit.tufts.edu/statisticalcomputingmatters/#007019</guid>
        
        
         <pubDate>Tue, 06 Oct 2009 15:15:55 -0500</pubDate>
      </item>
            <item>
         <title>Survey weights and new ANES suggestions</title>
         <description><![CDATA[Many large surveys are structured as complex sample designs that reflect various stratification considerations.  Statistics calculated from such designs must be weighted to reflect the general population of interest.  A clear discussion and set of recommendations by four prominent researchers for the calculation and implementation of weights using <strong><a href="http://www.electionstudies.org">ANES</a></strong> datasets can be found in the Sept. 2009 Technical Report, nes012427,  <strong>Computing Weights for American National Election Study Survey Data</strong>.  The <strong><a href="ftp://ftp.electionstudies.org/ftp/nes/bibliography/documents/nes012427.pdf">report</a></strong> can be found in the Reference Library section of the ANES
website.
 Single panel cross-sectional, two-wave panel and multi-wave panel recommendations are considered along with nonresponse and poststratification weighting. The generality of discussion applies to other large studies such as Census data, and similar surveys. 

]]></description>
         <link>http://blogs.uit.tufts.edu/statisticalcomputingmatters/#006806</link>
         <guid>http://blogs.uit.tufts.edu/statisticalcomputingmatters/#006806</guid>
        
        
         <pubDate>Thu, 17 Sep 2009 14:47:09 -0500</pubDate>
      </item>
            <item>
         <title>Areal and point source spatial data models</title>
         <description><![CDATA[Researchers using spatial data are often faced with a mix of data obtained from several levels of scale, aggregation and point reference data. Classical geospatial regressions do not deal with this mix very well, and standard ordinary regressions even worst.  A unified treatment is the topic of a recent article, "Reparameterized and Marginalized Posterior and Predictive Sampling for Complex Bayesian Geostatistical Models"    in Volume 18, Number 2 of <strong><a href="http://pubs.amstat.org/loi/jcgs">JCGS</a></strong>.  In short, the authors cleverly reparameterized and recast the problem so as to allow efficient MCMC samplers to address the Bayesian estimation task.  Their article's supplemental materials provide the <strong><a href="http://www.r-project.org">R</a></strong> and <strong><a href="http://mathstat.helsinki.fi/openbugs/Home.html">OpenBugs</a></strong> codes to address the efficient estimation tasks outlined. ]]></description>
         <link>http://blogs.uit.tufts.edu/statisticalcomputingmatters/#006762</link>
         <guid>http://blogs.uit.tufts.edu/statisticalcomputingmatters/#006762</guid>
        
        
         <pubDate>Wed, 09 Sep 2009 11:29:22 -0500</pubDate>
      </item>
            <item>
         <title>Spss resources</title>
         <description><![CDATA[<strong><a href="http://www.spss.com">Spss</a></strong> software has an extensive tutorial built into its product and most first time users will benefit from using it.   Additional Spss resources can be found <strong><a href="http://www.spsstools.net/spss.htm">here</a></strong>. ]]></description>
         <link>http://blogs.uit.tufts.edu/statisticalcomputingmatters/#006755</link>
         <guid>http://blogs.uit.tufts.edu/statisticalcomputingmatters/#006755</guid>
        
        
         <pubDate>Tue, 08 Sep 2009 15:49:10 -0500</pubDate>
      </item>
            <item>
         <title>R available on Tufts Linux Cluster</title>
         <description><![CDATA[Elsewhere on this Blog I mention various bits and pieces of <strong><a href="http://www.r-project.org">R</a> </strong>software.  Now that the fall semester is upon us, we have added many new R BioInformatic packages to the baseline R installation on our research linux <strong><a href="http://go.tufts.edu/cluster">cluster</a></strong>.  This option provides a scalable solution to those needing additional computing power.   ]]></description>
         <link>http://blogs.uit.tufts.edu/statisticalcomputingmatters/#006741</link>
         <guid>http://blogs.uit.tufts.edu/statisticalcomputingmatters/#006741</guid>
        
        
         <pubDate>Thu, 03 Sep 2009 15:00:54 -0500</pubDate>
      </item>
            <item>
         <title>Bayes Software...the next big effort</title>
         <description><![CDATA[Historically, Bayesian solutions were computed as needed in formal languages(Fortran, C,java,etc...) and later in high level solutions like Matlab,Gauss,SAS/IML  and others.  Then <a href="http://www.mrc-bsu.cam.ac.uk/bugs/"><strong>Winbugs</strong></a> came along and offered a higher level interface, similar to what Matlab did for linear algebra syntax and functionality, but closer in spirit to the notation used by Statisticans to depict multilevel probability based models.  While all of these still have their pros and cons, we find now an explosion of Bayesian solutions implemented in <a href="http://www.r-project.org"><strong>R</strong></a> with the benefit of object orientation.  If one takes a look at the "<strong>CRAN Task View: Bayesian Inference</strong>" page on the R site maintained by Jong Hee Park, one will find 60+ packages with numerous solutions to many standard statistical modeling problems.  Of the many listed, note the package <strong>BAS</strong> for Bayesian Model Averaging in linear models using stochastic or deterministic sampling without replacement from posterior distributions. Prior distributions on coefficients are from Zellner's g-prior or mixtures of g-priors corresponding to the Zellner-Siow Cauchy Priors or the Liang et al hyper-g priors.  The stochastic search capability allows for model specification searches that would not have been possible a few years ago with the ease that is now possible.  ]]></description>
         <link>http://blogs.uit.tufts.edu/statisticalcomputingmatters/#006504</link>
         <guid>http://blogs.uit.tufts.edu/statisticalcomputingmatters/#006504</guid>
        
        
         <pubDate>Tue, 14 Jul 2009 11:51:59 -0500</pubDate>
      </item>
            <item>
         <title>Inference for R and MS Excel</title>
         <description><![CDATA[The widespread availability of MicroSoft Excel has created a less than desirable environment for statistical computing.  In my opinion the Excel statistics add-in leaves much to be desired relative to real statistics packages.  One solution for extending the usefullness of Excel is to abandon the Excel stats package in favor of <strong><a href="http://www.InferenceforR.com">InferenceforR</a></strong>.  This product allow for the use of <strong><a href="http://www.r-project.org">R</a></strong> within Excel. See the following <strong><a href="http://screencasts.bluereference.com/Special/TurboChargeStockAnalysis/">screencast</a></strong> for a slick presentation.]]></description>
         <link>http://blogs.uit.tufts.edu/statisticalcomputingmatters/#006301</link>
         <guid>http://blogs.uit.tufts.edu/statisticalcomputingmatters/#006301</guid>
        
        
         <pubDate>Thu, 21 May 2009 15:26:35 -0500</pubDate>
      </item>
            <item>
         <title>Numerical routines for Java developement</title>
         <description><![CDATA[If you want to save time and improve accuracy of your programs, don't reinvent the wheel,  consider using <a href="http://math.nist.gov/javanumerics/">javanumerics</a>. A large variety of statistical and mathematical classes are available.  Note, not all options are free.  ]]></description>
         <link>http://blogs.uit.tufts.edu/statisticalcomputingmatters/#006250</link>
         <guid>http://blogs.uit.tufts.edu/statisticalcomputingmatters/#006250</guid>
        
        
         <pubDate>Wed, 06 May 2009 12:04:56 -0500</pubDate>
      </item>
            <item>
         <title>Specialized Statistical Resources</title>
         <description><![CDATA[Often researchers will need access to functionality that isn't found in commercial statistics packages. This problem varies quite a bit and is meet with specialized solutions by the statistical community. These solutions are often cutting edge, reflecting new statistical research. Most stats packages allow some form of macro/code authorship. This works to a point and often provides a just in time solution. Well known examples include Matlab's scripting language, <strong>SAS IML</strong>, <strong>GAUSS</strong>, <strong>Stata</strong>, <strong>Splus</strong> and <strong>R</strong>. Yet others will seek stand alone solutions in one form or another. These range from public domain C, C++, Fortran, and Java research subrountines to stand-alone programs with various user interfaces. The goal of this blog is to list references and short descriptions of various solutions that may offer additional insights into your research and the statistical methods, and maybe even save you some time. About a dozen or so topics some to mind and I hope to address them shortly. These posts are not intended as statistical guidance nor endorsement. Most problems are best addressed by the advice of an experienced practitioner in the relevant field.]]></description>
         <link>http://blogs.uit.tufts.edu/statisticalcomputingmatters/#006191</link>
         <guid>http://blogs.uit.tufts.edu/statisticalcomputingmatters/#006191</guid>
        
        
         <pubDate>Wed, 29 Apr 2009 10:55:08 -0500</pubDate>
      </item>
            <item>
         <title>Statistical Power calculations</title>
         <description><![CDATA[Statistical power calculations are often needed at various stages of planning for establishing sample sizes.  
Elsewhere on this Blog I mention <strong>PiFace</strong> as a power calculation tool.  However <strong>SAS</strong> users may find the following three <strong>SAS</strong> macros of interest.   
<a href="http://www.bio.ri.ccf.org/power.html">UnifyPow</a> is an extensive collection of power calculators implemented in <strong>SAS</strong> as a Macro.  A  SAS proceedings <a href="http://www2.sas.com/proceedings/sugi22/STATS/PAPER287.PDF "><strong>paper</strong></a> about UnifyPow discusses its broad generality.   The second macro is <a href="http://www.math.yorku.ca/SCS/sasmac/rpower.html ">rpower</a>  and addresses the reprospective aspect of the issue.  The third macro, <a href="http://www.nesug.org/proceedings/nesug08/sa/sa12.pdf ">glimmixsamplesize</a>, is designed to use the generality of <strong>SAS's</strong> Proc Glimmix for generalized linear mixed models.   These macros provide a substantial increase in the number of settings that can be addressed for power calculations. ]]></description>
         <link>http://blogs.uit.tufts.edu/statisticalcomputingmatters/#006020</link>
         <guid>http://blogs.uit.tufts.edu/statisticalcomputingmatters/#006020</guid>
        
        
         <pubDate>Mon, 13 Apr 2009 14:20:25 -0500</pubDate>
      </item>
      
   </channel>
</rss>
