Ticket #1845 (closed defect: fixed)

Opened 6 years ago

Last modified 6 years ago

Fix ImageJ 1.x mirror

Reported by: dscho Owned by: dscho
Priority: major Milestone: imagej2-b7-ndim-data
Component: Server Admin Version:
Severity: serious Keywords:
Cc: curtis, justin.senseney@… Blocked By:
Blocking: #1705

Description (last modified by dscho) (diff)

Since we cannot use rsync, we set up a mirror script. To be nice, we tried to use HEAD requests whenever possible (but quite a few directories do not have index.html files, making HEAD requests impossible). The Jenkins job ran twice a day:

 http://jenkins.imagej.net/jobs/ImageJ-1.x-website-mirror/

Unfortunately, this was still too much and we were asked to download a large .tgz file with the complete files every single night.

So change the mirror yet again (the fifth iteration now).

Change History

comment:1 Changed 6 years ago by dscho

  • Status changed from new to closed
  • Resolution set to fixed

The advantage now, of course, is that we get all the files that are there, not just the ones we can reach directly or indirectly via  http://imagej.nih.gov/ij/index.html.

To make things a bit nicer for ourselves (while I pour hours and hours into this ticket, I might just as well improve things for our own benefit), let's put things into a Git repository.

So this is what I have done so far:

  • since I trust things on the other side to run as smoothly as experience taught us, I test explicitly whether the file is older than 26 hours and fail if it is
  • I then import the .tgz file into a Git directory for easier handling
  • Then, I check out the files. This will touch only those files that really changed (removing those that have been deleted), helping the subsequent steps because of maintained mtimes.
  • I had to adjust the MirrorWebsite class quite a bit to accomodate for the situation that we are rewriting links from a mirror of the website.
  • Then I update the Git repository for the complete update site, adding a merge between the previous state and the imported .tgz file.

To determine the best time for this to run (I was told to use "the off hours" with a hint that I should heed both US and EU which is funny because I am European so I rarely forget that there are more than four timezones), I checked the timestamp of the ij.tgz file. From an awfully small n I deduce that the job is run at half past midnight by cron and that it runs for a little less than four minutes. Of course, the server is misconfigured to show the time in UTC (which it still calls GMT which is funny given that we're living in the 21st century already for twelve years now) and does not show that timestamp in local time. My best bet was to leave things at when they used to run: five past one in the morning (local time, which is still one and a half hours after the cronjob starts). And I removed the noon mirroring which now means that whenever there are changes on the website, the mirror is out-of-date for most of the day.

All of this can be found here:

 http://jenkins.imagej.net/job/ImageJ-1.x-website-mirror/

I Cc:ed the only helpful guy from the other side of this mirror business.

To make all of this robust, I spent all morning on this, in total 3.5 hours now. I'm glad nobody on the other side had to spend this much time on the issue, though, because it gives me that warm and cozy feeling of being a good administrator.

Whether it will really work, of course, we will only see tomorrow. Cross your fingers. Oh well, I'll cross them myself.

Version 0, edited 6 years ago by dscho (next)

comment:2 Changed 6 years ago by dscho

  • Description modified (diff)
Note: See TracTickets for help on using tickets.