Ticket #1845 (closed defect: fixed)

Opened 6 years ago

Last modified 6 years ago

Fix ImageJ 1.x mirror

Reported by: dscho Owned by: dscho
Priority: major Milestone: imagej2-b7-ndim-data
Component: Server Admin Version:
Severity: serious Keywords:
Cc: curtis, justin.senseney@… Blocked By:
Blocking: #1705

Description (last modified by dscho) (diff)

Since we cannot use rsync, we set up a mirror script. To be nice, we tried to use HEAD requests whenever possible (but quite a few directories do not have index.html files, making HEAD requests impossible). The Jenkins job ran twice a day:

 http://jenkins.imagej.net/jobs/ImageJ-1.x-website-mirror/

Unfortunately, this was still too much and we were asked to download a large .tgz file with the complete files every single night.

So change the mirror yet again (the fifth iteration now).

Change History

comment:1 Changed 6 years ago by dscho

  • Status changed from new to closed
  • Resolution set to fixed

The advantage now, of course, is that we get all the files that are there, not just the ones we can reach directly or indirectly via  http://imagej.nih.gov/ij/index.html.

To make things a bit nicer for ourselves, let's put things into a Git repository.

So this is what I have done so far:

  • since I trust things on the other side to run as smoothly as experience taught us, I test explicitly whether the file is older than 26 hours and fail if it is
  • I then import the .tgz file into a Git directory for easier handling
  • Then, I check out the files. This will touch only those files that really changed (removing those that have been deleted), helping the subsequent steps because of maintained mtimes.
  • I had to adjust the MirrorWebsite class quite a bit to accomodate for the situation that we are rewriting links from a mirror of the website.
  • Then I update the Git repository for the complete update site, adding a merge between the previous state and the imported .tgz file.

To determine the best time for this to run (I was told to use "the off hours" with a hint that I should heed both US and EU), I checked the timestamp of the ij.tgz file. From an awfully small n I deduce that the job is run at half past midnight by cron and that it runs for a little less than four minutes. My best bet was to leave things at when they used to run: five past one in the morning (local time, which is still one and a half hours after the cronjob starts). And I removed the noon mirroring which now means that whenever there are changes on the website, the mirror is out-of-date for most of the day.

All of this can be found here:

 http://jenkins.imagej.net/job/ImageJ-1.x-website-mirror/

Last edited 6 years ago by dscho (previous) (diff)

comment:2 Changed 6 years ago by dscho

  • Description modified (diff)
Note: See TracTickets for help on using tickets.