Difference between revisions of "Proposed changes to backup and archiving"
import>Cen1001 |
import>Cen1001 |
||
(10 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
+ | ==Current systems== |
||
+ | |||
We need more backup space as workstation disks are getting larger and we |
We need more backup space as workstation disks are getting larger and we |
||
have acquired several more clusters. We currently have about 3Tb split |
have acquired several more clusters. We currently have about 3Tb split |
||
over three servers. The current system only lets computer officers do |
over three servers. The current system only lets computer officers do |
||
− | restores; it would be |
+ | restores; it would be an improvement if people could access their own backups. It would also be very useful to be able to keep some backups for longer than the current two weeks. |
− | would also be good to be able to keep some backups for longer than the |
||
− | current two weeks. |
||
+ | Our existing archive server is not user accessible. It is unreliable, and has no free space. |
||
⚫ | |||
+ | |||
+ | ==New backup system== |
||
+ | |||
⚫ | |||
backups onto that, and make them user-accessible. We would have to keep |
backups onto that, and make them user-accessible. We would have to keep |
||
one of the old backup servers running for technical reasons: some of the |
one of the old backup servers running for technical reasons: some of the |
||
Line 12: | Line 16: | ||
of those machines would be insecure. Eventually this problem will go away |
of those machines would be insecure. Eventually this problem will go away |
||
as machines are reinstalled. |
as machines are reinstalled. |
||
+ | |||
⚫ | |||
+ | user-restorable backups means that files which are set to be 'world-readable' become readable by anyone in the entire sector, even if they don't have a user account on the machine the original file was on. Anyone who doesn't want this can protect themselves by changing their file permissions to no longer be world-readable. However |
||
+ | the default is to have world-readable files, and people forget to change them. If we think this will cause problems then we would have to change the default. |
||
+ | |||
+ | The cost of such a new server would be a maximum of 7500ukp (inc VAT), which is the cost of an Apple xServe RAID with 6Tb. In practice we'd buy a PC-based whitebox machine which should be cheaper. I would guess a maximum of 6000ukp but have not got firm figures as these can't be bought off the shelf. If that is too high we could buy a smaller system with expansion room, but expansion is likely to be very disruptive. |
||
+ | |||
+ | ==New archive system== |
||
The other two old backup servers would immediately be free for reuse. I |
The other two old backup servers would immediately be free for reuse. I |
||
− | would make those into |
+ | would make those into a new archive system. This would let us clear old homespaces off clusters and workstations as soon as people leave while |
+ | keeping a read-only copy available. Assuming people set their file permissions appropriately on the original data, the rest of their research group will be able to access it on the archive server. The same issue with world-readable files arises as with the backup server. |
||
− | would let us clear old homespaces off clusters and workstations while |
||
− | keeping a read-only copy available. The data won't change so we don't need |
||
− | regular backups of it: the two servers would mirror each other and be |
||
− | sited a long way apart. |
||
⚫ | |||
− | archive data. This will might vary from account to account. One way to do |
||
− | it would be to have a 'deletion date' for each account which could be |
||
− | extended if needed. I would then warn the leader of the group who produced |
||
− | the data when the date came round, before actually deleting things. It is |
||
− | impossible for me to reliably warn the owner of the data because once |
||
− | they've left they don't tell me about changes in their contact details. |
||
+ | The data would be read only so we don't need regular backups of it: the two servers would each contain a copy and be sited a long way apart. Software would regularly check to see if the copies were identical in order to pick up any problems. |
||
⚫ | |||
+ | |||
− | user-restorable backups means that files which are 'world-readable' inUnix terms (ie readable by anyone with an account on the machine they are |
||
⚫ | |||
− | on) become readable by anyone in the entire sector. Anyone who doesn't |
||
+ | archive data otherwise we will just run out of space again. This might vary from account to account. When the date comes round the leader of the group would be told, and would have a chance to extend the date. I would rather leave it to the leader of the group than the owner of the data because, by the time the date comes round (I would imagine at least a year after the owner leaves) I will probably not have reliable contact details for them. |
||
− | want this can protect themselves by changing their file permissions to no |
||
− | longer be world-readable, but IME people don't, even if warned, and are |
||
− | often unpleasantly surprised when they discover others can read their |
||
− | files. |
Latest revision as of 11:03, 29 June 2006
Current systems
We need more backup space as workstation disks are getting larger and we have acquired several more clusters. We currently have about 3Tb split over three servers. The current system only lets computer officers do restores; it would be an improvement if people could access their own backups. It would also be very useful to be able to keep some backups for longer than the current two weeks.
Our existing archive server is not user accessible. It is unreliable, and has no free space.
New backup system
I want to buy a new backup server with about 6Tb of space, move the backups onto that, and make them user-accessible. We would have to keep one of the old backup servers running for technical reasons: some of the older machines are configured in such a way that user accessible backups of those machines would be insecure. Eventually this problem will go away as machines are reinstalled.
There is one potential disadvantage to this proposal: having user-restorable backups means that files which are set to be 'world-readable' become readable by anyone in the entire sector, even if they don't have a user account on the machine the original file was on. Anyone who doesn't want this can protect themselves by changing their file permissions to no longer be world-readable. However the default is to have world-readable files, and people forget to change them. If we think this will cause problems then we would have to change the default.
The cost of such a new server would be a maximum of 7500ukp (inc VAT), which is the cost of an Apple xServe RAID with 6Tb. In practice we'd buy a PC-based whitebox machine which should be cheaper. I would guess a maximum of 6000ukp but have not got firm figures as these can't be bought off the shelf. If that is too high we could buy a smaller system with expansion room, but expansion is likely to be very disruptive.
New archive system
The other two old backup servers would immediately be free for reuse. I would make those into a new archive system. This would let us clear old homespaces off clusters and workstations as soon as people leave while keeping a read-only copy available. Assuming people set their file permissions appropriately on the original data, the rest of their research group will be able to access it on the archive server. The same issue with world-readable files arises as with the backup server.
The data would be read only so we don't need regular backups of it: the two servers would each contain a copy and be sited a long way apart. Software would regularly check to see if the copies were identical in order to pick up any problems.
There will have to be some sort of time limit on how long we keep the archive data otherwise we will just run out of space again. This might vary from account to account. When the date comes round the leader of the group would be told, and would have a chance to extend the date. I would rather leave it to the leader of the group than the owner of the data because, by the time the date comes round (I would imagine at least a year after the owner leaves) I will probably not have reliable contact details for them.