Opened 3 years ago

Last modified 16 months ago

#6 new change

Set up backups

Reported by: trev Assignee: matze
Priority: P1 Milestone:
Module: Infrastructure Keywords:
Cc: philll, christian, mathias@… Blocked By: #3638, #3770
Blocking: Platform: Unknown
Ready: yes Confidential: no
Tester: Verified working: no
Review URL(s):

Description

Currently, we don't back up any servers, even the original backup process doesn't run. Most servers don't have any valuable data anyway but we need to back up at the very least hg repositories, databases and stats. We need a generic backup script in sitescripts that will upload data daily to Hetzner's backup space and remove outdated backups (number of backups should be configurable, we probably want at least 30). Backups need to be encrypted, some of the data is sensitive.

Once that is done we can configure individual servers to actually use that feature.

Change History (27)

comment:1 Changed 3 years ago by philll

  • Cc trev added
  • Priority changed from P2 to P1

I consider this issues being of the highest priority, when it comes to the backup of Trac, which already contains a very lot of work, which would be completely lost in case of some failure.

comment:2 Changed 3 years ago by trev

  • in_progress set to 0
  • Ready set

Hetzner allows FTP, SFTP and SCP to access their backup server. IMHO we should use SFTP with public key authentication. There is no built-in Python module for that, using the external scp command should be the easiest way.

We should write up a generic backup script, with the backup destination and directories/files/exceptions to be packed up listed in the configuration. We'll need to add support for databases (Mysql and PostgreSQL are the ones we are using) as well though that can be done in a second step. The backup script needs to pack up the files or SQL dumps listed and upload them to the backup server, file name should contain the date. It should then make sure that outdated backups are removed (we probably want to have at least 30 days of backups but this can be configurable as well).

comment:3 Changed 3 years ago by philll

  • Cc philll added

comment:4 Changed 3 years ago by philll

.

comment:5 Changed 3 years ago by trev

  • Cc trev removed

comment:6 Changed 3 years ago by trev

  • Owner set to jknisely
  • Status changed from new to assigned

comment:7 Changed 3 years ago by jknisely

Question about the host you're going to back up to, do you have shell access on there and the ability to run cron jobs?

comment:8 Changed 3 years ago by trev

No. See http://wiki.hetzner.de/index.php/Backup/en for the details of the backup host - it's storage only. Removal of old backups needs to happen when we upload a new one.

comment:9 Changed 3 years ago by trev

Are you making any progress? This is a P1 issue, it needs to be resolved timely.

comment:10 Changed 3 years ago by jknisely

Yep, I've been working on it a bit in my spare time. I should have a pull request up for you in 2 or 3 days depending on my other work, but if you don't want to wait and just knock it out yourself that's fine too.

comment:11 Changed 3 years ago by trev

Ok, thank you. Just making sure this doesn't get forgotten.

comment:12 Changed 3 years ago by jknisely

For tracking, my sitescripts fork is up at: https://github.com/jknisely/sitescripts
See:

backup.py
backup.cfg.sample

The first draft works on my system, but this is my first script using python, so feel free to correct me where my code sucks and/or is not following best practices.

This copy backs up files, mysql and postgres dbs, and gpg encrypts them

TODO

  • aging - delete old backups
  • lots more error checking

comment:13 Changed 3 years ago by philll

  • Status changed from assigned to new

The assigned state will be dropped by #403

comment:14 Changed 3 years ago by trev

  • Cc christian added

comment:15 Changed 3 years ago by christian

I looked into the backup script and for me some points need some attention:

  • Error handling - At the moment the script doesn't really handle errors well. What happens if one step fail? I think just exit the script, isn't enough. It should cleanly shutdown, remove left over files and the half finished backups. So that the next start of the script has a clean environment.
  • Log file - The script needs to keep a log file. So that we able to see when things go wrong. We also maybe want to parse the log file for nagios so we get warnings and errors directly via nagios.

comment:16 Changed 3 years ago by jknisely

Still a work in progress, but I did a bit of overhauling (updated on github). Added more error checking and better error messages, tried to PEP8 format it a bit, subprocess.Popen rather than os.system, reformatted a bit and broke out a few pieces into functions.

  • No logging yet -- Do you really want to do a logfile for this? I would generally just make sure the backup job is successful by having a monitoring point to verify recent backups on the backup host, and have the cron job it runs from fire off an email to interested parties should the script have any output (normal successful run shouldn't print any messages).

comment:17 Changed 3 years ago by trev

No, I don't think we want logging here. The usual approach is making sure that the script prints a meaningful error message and doesn't leave things in a horribly broken state. Then we will get the error output via cron mail and can fix/rerun backups. Parsing logs files with Nagios would be a mess.

comment:18 Changed 3 years ago by matze

  • Platform set to Unknown

Hi there,

are you still working on this ticket?

It's the topmost priority for our infrastructure - in fact, it's the only infrastructure ticket labeled P1 right now. Do you need any help? I'd be glad to support you in any fashion necessary ;-)

Cheers!
Matze

comment:19 Changed 3 years ago by matze

  • Cc mathias@… added

comment:20 Changed 3 years ago by christian

Hey,

I working on this at the moment, sorry for not making a update.
I tested the script and found some things that bug me:

  • It always wants to backup MySQL and PostgreSQL, so if they are missing I get errors. And the script doesn't exit with an error, and unreachable database should be an error. Also not all servers have a running database so we should not enforce a backup there.
  • I tried to build up a testing environment to run the script, but it doesn't really build me an backup, and also didn't give me a proper error message. I think we need some proper test to make sure the script runs like we want it.
  • And I working on the change from gpg to openssl.

So this it's from my point.

comment:21 Changed 3 years ago by matze

Despite the fact that I dislike to suggest a direction change when already in the implementation phase, the current state of development may still allow to think about the following: Why do we use yet another self-written solution when there's plenty of well-tested and well-known software out there?

Like e.g. Bacula (http://www.bacula.org/), which is free and open software, highly configurable, adjustable and suits the needs of thousands of projects worldwide - at enterprise level. (In fact I believe one could consider it a de-facto standard for UNIX/Linux environments, which is why I prefer this one.) And yes, one can use it with the Hetzner backup storage as well.

However, as I've said, there is also plenty of other software. See http://www.techrepublic.com/blog/10-things/10-outstanding-linux-backup-utilities/ and http://wertarbyte.de/tartarus.shtml, for example. The latter is even suggested by Hetzner (http://wiki.hetzner.de/index.php/Backup/en).

As I see it, the effort in setting up any of the aforementioned software could be a bit more than "just" fixing the issues with the current approach. Still, the result is probably better and future-ready if we avoid reimplementing the wheel. Yet I understand that this can be quite interesting, but still --

Oh, and I'd love to code-review any of these tools.. ;-)

comment:22 Changed 3 years ago by christian

I also prefer using an existing software solution. Bacula looks promising.

comment:23 Changed 3 years ago by matze

  • Owner changed from jknisely to matze

FYI: As discussed with trev and fhd, I'll take over this ticket in order to evaluate the use of Bacula and to provide a configuration example via patch-set. We'll decide which way to go afterwards.

comment:24 Changed 3 years ago by trev

It seems that Bacula also relies on scripts that will dump the databases in order to back them up. Then I don't really see the advantages over other backup solutions.

The problem with Tartarus is, it is a shell script monster. The more promising solution is actually duplicity which is written in Python, still maintained, seems to have a clean codebase and is available for Ubuntu. It also happens to be a command line script that should be trivial to install and "set up". Finally, with that solution we don't need to worry about security issues of yet another PHP-based web UI.

Last edited 3 years ago by trev (previous) (diff)

comment:25 Changed 3 years ago by matze

In Bacula, one would probably try the official plugins first, e.g. the MySQL (http://www.baculasystems.com/mysql-plugin) and PostgreSQL (http://www.baculasystems.com/products/bacula-enterprise-plugins/postgresql-plugin). But yes, one can always use scripts where necessary. The important benefit in Bacula is probably the very flexible and thought-through structure. So even if there would be an requirement to script something, one can still integrate it cleanly with other modules (such as the built-in monitoring) - irregardless the scripting-language.

Regarding Tatarus, however, I agree with your points.

Duplicity is still in beta and has a relatively limited feature set. Although the latter probably results in an easy setup, I hardly doubt that the tool could become our best choice at this point. There would still be a lot of custom scripting around, especially when considering possible future requirements and development.

As I've said already, I'd rather go with a more established solution - in order to avoid subsequent switches to other software (due to lack of fulfilled requirements at some point), or even increasing the pile of non-integrated, custom scripts used around the entire backup management.

Last edited 3 years ago by matze (previous) (diff)

comment:26 Changed 17 months ago by matze

  • Blocked By 3638 added

comment:27 Changed 16 months ago by matze

  • Blocked By 3770 added
Note: See TracTickets for help on using tickets.