Last summer I had the privilege of interviewing arXiv founder Paul Ginsparg extensively during two lovely days at Cornell. We are currently in the last stages of using this material to write what will be a series of papers on the ecology of physics knowledge and publishing practices, but for now I’ll share an interesting bit towards the end of the interview. Most of this material can be found elsewhere on the Web, but here it is straight from the horse’s mouth. I plan to eventually make the entire transcripts available online, as we covered many topics that may be of interest to a lot of OA audiences.
Ginsparg: The story as I usually tell it is there was a postdoc, Joanne Cohn, who had had this mailing list and she was sending things out. I remember a physicist, Spenta Wadia who -we were all sitting around at lunch- had made this comment about how he’s afraid to travel. He was from India. Because these emails from Joanna’s list will overrun his disk allocation. Now, in order to understand this, you have to understand something about the Digital Equipment Corporation. After the PDP-8, -10, -11, they got to VAXes and this VAX software which people were using had, you would get a limit on your amount of disk space, because disk then was a valuable resource.
The limits were typically one thousand or two thousand block. Now, a block is some ancient unit of measure… let me translate it to you… two thousand blocks is equal to one megabyte and that’s what I had, two thousand blocks. I knew blocks. I didn’t know about megabytes at the time. So what it meant was you would be TeXing your articles and it very helpfully would not let you log off if you were over your disk allocation. So you would then delete the postscript, delete the dvi, delete all the log files and everything so would get just under your disk allocation and then log off. But, incoming mail on your mail spool counted against your disk allocation so the first mail message you would receive would then put you over your disk allocation. Subsequent email messages would then be bounced back to the sender saying. “this user is over disk allocation”. That’s what he was complaining about, that those messages would come in. If you were travelling, networks were not as accessible. You couldn’t read… you can’t imagine now somebody not reading email now every half hour, or every thirty seconds, but back then you could imagine a week or two on travel without access to the network and then using these email messages because of these articles that she was sending around. I remember turning to somebody and saying, “that’s because we’re doing it wrong. What we should have is -this is before the Web of course- is the emails going to some automated system, it stores it, sends out -it wasn’t decided immediately- either immediately or once a day, or once a week and we decided on once a day, at this time in Aspen before I wrote the software, send out a notification of the things that were received during the previous period and then you don’t send the full article out, which is the way it was working with these emailing lists. You send out only the titles and abstracts and make it available on demand. That’s when I wrote this e-mail transponder. You’d send the email to the system. It would parse it into the subject line, it would then get some number and it would email you back the TeX file and any postscript files compressed, you’d uuencode it in a tar file and then you would unpack them and process them. That was the original system. I want to remember, in Aspen, if I could… what does one remember from twenty years before? I remember the broad outlines, but I had all my emails from that week at Aspen.
Of the many things that I noticed about them that were fun was that all of the emails were concentrated, say, within a half-hour, one hour period, sometime during the day. The reason was, we didn’t have terminals or anything in our office. There was one satellite building that had three VT220s and modems and that’s what I used to connect to the system at Los Alamos and read my email and since I wasn’t at Aspen to do that, I would just late afternoon, once a day, I found in there I could pretty much piece together because I had a query to the system’s people at Los Alamos if it would cost anything for me to get a top-level email address @lanl.gov and I got this answer that even if I didn’t want an account all I wanted was some mail forwarding, which became standard a few years later for obvious reasons. But at the time they told me, no, they would have to charge me extra and I didn’t want to pay anything. That’s when I instead asked them to create the xxx.lanl.gov machine, which was the original email address it ran under. I have the day to day details on what happened on any given day. Then I know that was in late June, and then I know I went on vacation. I have one specific memory of bicycling in Italy and I know I was still thinking about this because I know it was then that I came up with the numbering scheme of using the 9108001 the year 91, the month 08, and then the number within there. I came up with that scheme because the original idea was that it was going to be a bulletin board and it would just be a moving three month window by which point the paper distribution system would catch up and I would have a cron process that after the end of three months would just do an rm 9108* and then, you know, clean up past directories. The serendipitous event was, a friend of mine Andy Strominger came to visit me at Los Alamos and said, “look, you keep making this argument that they’re small and they’re easy to store. Why don’t you store them forever?” and I said, “well, isn’t there a pain to have to retrieve them and TeX them and, you know, isn’t it just easier to Xerox something from your local preprint room?” He said, “no, it’s much easier to do that at the desk than either go down the hall or sometimes go into another building, or for that matter find one I’ve already gotten in a stack of papers over here.” The reason it was important it was him is because he’s somebody that I, at the time physicist regarded as the computer savvy-ones and the reluctant, and he was definitely among the latter, and so if it was coming from somebody who was not somebody who liked computers and lived and breathed them but just used them because he had to in order to write articles, coming from him, that became very strong. So luckily I was spared writing that cron process and had everything going back and that numbering scheme turned out to be useful because you can just look at it, sometimes it is a useful heuristic, you can just look at a glance at a number and know… we would be able to look at that superluminal article and find that it was 1109.2500 and we know instantly that was September 2011 and so that makes sense. […]
Reyes: What fascinates me is that the world’s biggest scientific repository was actually not intended to be a repository!
Ginsparg: That’s right. It was a bulletin board. It was not visionary, but I was good at writing C Shell which was what it was written on [laughs].