<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Jim Graham &#187; Scimatic</title>
	<atom:link href="http://jim-graham.net/archives/category/scimatic/feed" rel="self" type="application/rss+xml" />
	<link>http://jim-graham.net</link>
	<description>Graham on Graham</description>
	<lastBuildDate>Sat, 14 Apr 2012 21:14:45 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>What is &#8220;Reproducibility,&#8221; Anyway?</title>
		<link>http://jim-graham.net/archives/110</link>
		<comments>http://jim-graham.net/archives/110#comments</comments>
		<pubDate>Sat, 29 May 2010 14:43:36 +0000</pubDate>
		<dc:creator>Jim</dc:creator>
				<category><![CDATA[Science]]></category>
		<category><![CDATA[Scimatic]]></category>

		<guid isPermaLink="false">http://jim-graham.net/?p=110</guid>
		<description><![CDATA[Crossposted from Scimatic Titus Brown has a very funny spoof about how scientists will probably react to the NSF&#8217;s moves towards data management plans. Go read it, I&#8217;ll wait. After detailing all manner of horrible data, licensing and source code management techniques, he closes with Meanwhile we will continue publishing exciting sounding (but irerproducible [sic]) [...]]]></description>
			<content:encoded><![CDATA[<p>Crossposted from <a href="http://www.scimatic.com/node/361">Scimatic</a></p>
<p><a href="http://ivory.idyll.org/blog">Titus Brown</a> has a very <a href="http://ivory.idyll.org/blog/may-10/data-management.html">funny spoof</a> about how scientists will probably react to the <a href="http://www.nsf.gov/news/news_summ.jsp?cntn_id=116928&amp;org=NSF">NSF&#8217;s moves towards data management plans</a>. Go read it, I&#8217;ll wait. After detailing all manner of horrible data, licensing and source code management techniques, he closes with</p>
<blockquote><p>Meanwhile we will continue publishing exciting sounding (but irerproducible [sic]) analyses</p></blockquote>
<p>I&#8217;m not sure how I feel about this. All the disturbing practices he details are, well, disturbing. And not very scientific. However, he implies in his last sentence that the end goal to having well-structured data and documentation and source code in a version control tool is &#8220;reproducibility.&#8221; But there&#8217;s no offered definition of what &#8220;reproducibility&#8221; is.</p>
<p>This topic seems to come up a lot in the Open Science and <a href="http://www.globalnerdy.com/2009/08/01/science-2-0-how-computational-science-is-changing-the-scientific-method/">Science 2.0</a> discussions. And the entry-level definition of &#8220;reproducibility&#8221; seems to be that another scientist or group will take your data and your tools and verify your result.</p>
<p>Sorry, but that&#8217;s not reproducibility. As the climate folks say, that&#8217;s <a href="http://www.realclimate.org/index.php/archives/2009/02/on-replication/langswitch_lang/in/">replication.</a> If you take all the same data and all the same tools, one of two things can happen:</p>
<ul>
<li>You get a different result. That only shows that one of us is sloppy and shouldn&#8217;t be doing science.</li>
<li>You get the same result. That&#8217;s like plotting the same data point in two different colours and saying you&#8217;ve learned something.</li>
</ul>
<p>There&#8217;s value in the first outcome, especially if you show I&#8217;m the sloppy one. I&#8217;m just not sure <em>how much</em> value there is. It&#8217;s going to be hard to convince some young researcher to take a year or five to figure out that maybe some other dude might be wrong. It&#8217;s just not that appealing. I&#8217;d rather work on my own ideas.</p>
<p>The real problem is that reproducibility or verification or whatever you want to call it is a <em>lot</em> harder than just running someone else&#8217;s code. It probably means designing a different experimental setup. Controlling for different biases. Getting a statistically independent data set. These things cost time and money, both of which are in short supply. But all these things are critical to say that a result has been reproduced.</p>
<p>A short example from my previous life. My <a href="http://ktev.fnal.gov/public/ktev_theses.html">thesis</a> was about <a href="http://en.wikipedia.org/wiki/Cp_Violation#Direct_CP_violation">CP violation in the neutral kaon sector</a>. We measured a parameter called <a href="http://ktev.fnal.gov/public/physics/epsilon_prime/epsilon_prime.html">Epsilon-prime</a>. It doesn&#8217;t really matter what it is or what it describes. What mattered at the time was whether or not it had a value of zero. The Fermilab results said yes, consistent with zero, and the CERN results said no, it&#8217;s non-zero. A real &#8220;irreproducible&#8221; disagreement. And dammit, both groups had pride on the line and needed to be right.</p>
<p>So, both groups built new experiments. Both groups looked at each others techniques, and cherry-picked the best ideas. We went back for more funding. The second round of experiments got different results: now the Fermilab result was farther from zero than the CERN result. But by now, the two results were consistent within their respective uncertainties, and also consistent with a non-zero interpretation. That&#8217;s a win and reproducibility.</p>
<p>A similar thing is being reported in the <a href="http://www.nytimes.com/2010/05/18/science/space/18cosmos.html"><em>New York Times</em> from the DZero collaboration</a>. No one is interested in looking at DZero&#8217;s data or their software. They are interested if the CDF collaboration has a similar, independent result that verifies what DZero is reporting.</p>
<p>This is going to be a bigger problem in the future, not just in physics, but also in bioinformatics. The scale of the data and the experiments is so large that no one will be able to mount a complementary experiment to confirm the results. Once the LHC produces peta- or yotta-bytes of data, that&#8217;s it. It&#8217;s all we got.</p>
<p>So in that respect, Brown&#8217;s points are good. You have to have decent data management plans. Scientists owe it to the people who will come later, and to the people (i.e. the taxpayer) who paid for the research. For some of these experiments will only be run once, and future scientists may have ideas to find stuff that we haven&#8217;t thought of yet. However, I&#8217;m not sure if he&#8217;s claiming it&#8217;s sufficient for reproducibility. I don&#8217;t think it is.</p>
<p>For source code that did the analysis; if it&#8217;s open and available and well architected and concise and documented &#8212; great. I&#8217;m not going to run it, but seeing it in that shape will give me confidence that you applied similar rigour to the rest of your experiment. It&#8217;s the reverse of the <a href="http://www.scimatic.com/node/313">Climategate-East Anglia problem</a>. I don&#8217;t believe those guys are doing good science because they sure aren&#8217;t writing good code. As <a href="http://www.easterbrook.ca/steve/?p=1679">Steve Easterbrook points out, there are other climatology groups writing really tight software with good development practices</a>. I&#8217;m probably going to trust their models more. So there certainly are benefits to all the things Brown indirectly suggests.</p>
<p>Now, none of this discussion is new; new to me maybe, but the climate folks have been <a href="http://moregrumbinescience.blogspot.com/2009/11/data-set-reproducibility.html">all</a> <a href="http://www.easterbrook.ca/steve/?p=1001">over</a> <a href="http://www.realclimate.org/index.php/archives/2009/02/on-replication/langswitch_lang/in/">this</a> for a while. And it&#8217;s a really tough problem. I don&#8217;t have any answers, but the first step that the community is having is at least trying to figure out what the terms mean.</p>
<p>So, if you got this far, here&#8217;s the summary:</p>
<ul>
<li>By all means, please make your data available after you publish your paper. In certain cases (LHC, bioinformatics) it&#8217;s all we got and we may need to look back at it for other stuff. Plus, since I paid for it with my taxes, I kinda feel I  own a chunk of it anyway.</li>
<li>Make sure you write clean, tight code. Use version control and tests. Make your code available; not necessarily because I want to run it, but because it indicates to me that you&#8217;re confident in the code, and that I should be confident in your result.</li>
</ul>
<p>But let&#8217;s drop the idea that I&#8217;m going to take your data and your code and &#8220;reproduce&#8221; your result. I&#8217;m not. First, I&#8217;ve got my own work to do. More importantly, the odds are that nobody will be any wiser when I&#8217;m done.</p>
]]></content:encoded>
			<wfw:commentRss>http://jim-graham.net/archives/110/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Network Effects</title>
		<link>http://jim-graham.net/archives/34</link>
		<comments>http://jim-graham.net/archives/34#comments</comments>
		<pubDate>Fri, 05 Sep 2008 16:04:38 +0000</pubDate>
		<dc:creator>Jim</dc:creator>
				<category><![CDATA[Scimatic]]></category>

		<guid isPermaLink="false">http://jim-graham.net/archives/34</guid>
		<description><![CDATA[Cross-posted from Scimatic.com Jen Dodd wrote to the attendees of SciBarCamp 2008 to ask &#8220;&#8230; what has happened for you because of SciBarCamp. New business contacts or opportunities? New research projects? New artistic collaborations? New directions in your work or education?&#8221; Well, for me the impact of SciBarCamp was pretty huge. At the time of [...]]]></description>
			<content:encoded><![CDATA[<p>Cross-posted from <a href="http://www.scimatic.com/node/255" target="_blank">Scimatic.com</a></p>
<p><a href="http://www.jendodd.com/" target="_blank">Jen Dodd</a> wrote to the attendees of <a href="http://www.scibarcamp.org" target="_blank">SciBarCamp 2008</a> to ask &#8220;&#8230; what has happened for you because of SciBarCamp. New business contacts or opportunities? New research projects? New artistic collaborations? New directions in your work or education?&#8221; Well, for me the impact of SciBarCamp was pretty huge.</p>
<p>At the time of SciBarCamp I was telecommuting as a developer for a company based in Chicago, and I was feeling pretty cut of from technology people in Toronto and the GTA. I read on <a href="http://weblog.raganwald.com" target="_blank">Reg Braithwaite&#8217;s blog</a> that he was attending, and given that <a href="http://www.leesmolin.com" target="_blank">Lee Smolin</a> (whom I had seen speak at the University of Chicago) would also be there, I decided to attend.</p>
<p>I did get to talk to both Reg and Lee. Reg and I got a cool introduction to the University of Toronto <a href="http://www.blueskysolar.utoronto.ca/" target="_blank">Solar Car project</a> as they took it out for a spin. But the best part of the weekend was meeting <a href="http://www.scimatic.com" target="_blank">Jamie McQuay</a>. I met Jamie the first night of SciBarCamp in his capacity as &#8220;Wal-Mart Greeter&#8221; and then throughout the weekend. It was clear that we share similar opinions on science, software and the business of software, and after a few longer meetings over the course of this summer, the end of the story is that I&#8217;ve <a href="http://www.scimatic.com/node/254" target="_blank">accepted a role</a> here at Scimatic as a partner. I don&#8217;t think I would have anticipated that going to SciBarCamp would have had that type of effect on my career or life when I signed up.</p>
<p>So the moral is; Get out there! Here in Toronto there are lots of opportunities:</p>
<ul>
<li><a href="http://to.pm.org" target="_blank">Toronto Perl Mongers</a></li>
<li>Ruby on Rails <a href="http://www.unspace.ca/innovation/pubnite/" target="_blank">Pub Nite</a></li>
<li><a href="http://barcamp.org/TorCamp" target="_blank">TorCamp</a> and <a href="http://www.democamp.ca/" target="_blank">DemoCamp</a></li>
</ul>
<p>to name but a few. Who knows what will come of it? Maybe a whole new direction to your life.</p>
]]></content:encoded>
			<wfw:commentRss>http://jim-graham.net/archives/34/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

