Alfresco BART

Project was moved to Github!

Please go to https://github.com/toniblyx/alfresco-backup-and-recovery-tool for downloads, questions, issues, suggestions or feedback. Thanks!

READ THIS WHITE PAPER TO UNDERSTAND THE BACKUP PROCESS: Alfresco Backup and Disaster Recovery – White Paper.pdf

 

CHANGELOG
[Aug/6/13] v0.1
– first release

[Nov/5/13] v0.2
– fixed LOCAL_DB_DIR comment
– added PGPASSWORD on dump command for Postgresql
– added date and time to any DB dump
– added logging to db dumps
– added backup of full Solr directory except live indexes (like a default installation based on the installer)
– added command line option to backup task, now you can invoke directly backup set (index, db, cs or files), if nothing is specified a backup will be done as in the configuration file.
– improved command options for restoration
– added “–single-transaction” to the mysqldump command
– added single file recovery from the contentstore (only mysql installations supported)
– added single file or directory recovery from the installation files.
– added “–allow-source-mismatch” in a force option if source hostname changes

Full list of features: http://blyx.com/2013/08/07/alfresco-backup-and-recovery-tool-release-v0-1/

REQUIREMENTS

For description and changelog see README file.

Alfresco BART needs Duplicity (Python backup tool). Install it or Alfresco BART won’t work. Most Linux distributions have readymade Duplicity packages available. If you encounter errors using the distro’s duplicity please double check if the version is outdated on http://duplicity.nongnu.org.

Please try using the latest stable Duplicity version (0.6.21) from the Duplicity website before filing bug reports. If you install Duplicity from the website’s tarball check next “Requirements” before:

  • Python v2.4 or later
  • librsync v0.9.6 or later
  • GnuPG for encryption
  • NcFTP version 3.1.9 or later
  • Boto 1.6a or later

Other Alfresco BART dependences are:

  • mysqldump for MySQL backup
  • pg_dump for PostgreSQL backup
  • imp for Oracle backup

INSTALLATION

  1. Create a GPG key for encryption support “gpg –gen-key” (double dash before gen), encryption is recommended.
  2. Copy the files “alfresco-bart.*” to the “scripts” directory inside your Alfresco installation directory or simply use it from anywhere in your file system (anyways I recommend to install it in your Alfresco installation).
  3. Check the executable permissions on “alfresco-bart.sh”, it must be executable (“chmod +x alfresco-bart.sh”).
  4. Protect permissions for alfresco-bart.properties, it must be set as read only for the user who runs the backup, i.e. to assign read permission only for the owner type “chmod 400 alfresco-bart.properties”.
  5. Edit ALFBRT_PATH variable in “alfresco-bart.sh”, read, understand and configure all required options in “alfresco-bart.properties”.
  6. Run “./alfresco-bart.sh” to get usage help
  7. Add “0 5 * * * /path/to/alfresco-bart.sh backup” to the appropriate crontab if you want to run your backup daily at 5AM (after Alfresco’s nightly backups and maintenance jobs). First time you run “alfresco-bart.sh backup” it will do a full backup then incremental backups as you configure in alfresco-bart.properties.

 

56 thoughts to “Alfresco BART”

  1. Entiendo que esto se podría llegar a integrar con Data Protector al ser un script?. De todas maneras, lo intentaremos y te contaré mi experiencia.

  2. Hola Roberto,

    Por lo que he podido ver Data Protector ya es un sistema de copias de seguridad y recuperación en si mismo, si está bien configurado, a priori, no tiene sentido usarlo con Alfresco BART.

    Saludos.

  3. Roberto he visto esto sobre Data Protector “The SharePoint GRE Microsoft SharePoint administrators to recover a single document, collection or folder directly via the SharePoint GUI. (Granular Recovery for SharePoint 2010 was introduced in version 7.0.)” posiblemente haya una forma de integrar, cuando esté esa funcionalidad (pareceida) terminada lo miraré.

  4. Buenas Toni, tiene una pintaza estupenda (y no te digo cuando metas los TODOs).
    Solo un detalle, después de probarlo he confirmado que la parte del backup de postgresql parece no realizarse debidamente siguiendo las instrucciones “estándar” :
    WARNING 10 ‘/opt/alfresco-4.2.c/alf_data/postgresql’
    . Error al acceder a un archivo posiblemente bloqueado /opt/alfresco-4.2.c/alf_data/postgresql

    Como sabes el owner del folder postgresql (dentro de alf_data) es postgres y tanto este usuario como los permisos asignados a dicho folder (rwx——) se generan automáticamente en la instalación completa by default (version community), por lo que si lanzamos la ejecución con el usuario “alfresco” este no tendría privilegios.
    No se si tienes pensado algo al respecto, si no recuerdo mal en su momento tuve que incluir, anterior a la ejecución del pg_dump, un “export PGPASSWORD=pass_de_user_postgres” (con su posterior unset PGPASSWORD tras la ejecucion del backup) para ser capaz de hacerlo, pero no estoy del todo seguro y no tengo a mano el script que he estado utilizando hasta ahora en las instalaciones con postgresql.

    En fin, seguro que de una forma u otra podremos adaptarlo para este caso concreto, y siendo así no dudes que aportare tal cambio 😛 (tienes pensado gestionar la evolución del proyecto en google code o similares?).

    Un abrazote y nos vemos pronto!!.

  5. Hola Javi, perdona la demora en contestar tu comentario, vo que “se me saltan las bielas” 😛

    Lo que comentas de los permisos por defecto de la carpeta alf_data/postgresql no lo había visto que en las instalaciones que he probado… Pero acabo de hacer una prueba y es cierto. De todas formas lo ideal es hacer el backup como root (dependiendo de los componentes que activemos en el backup). Por eso no había notado ese problema antes. De todas formas añadido tu fix y estará en la próxima versión. Gracias mil!

    Linea 151 de alfresco-bart.sh:
    export PGPASSWORD=${DBPASS}

    Y listo 😉

  6. The local destination run fine, but s3 I can’t get it work got BackendException: Error uploading s3+http://.s3-website-us-east-1.amazonaws.com/solr/duplicity-full.20130926T175722Z.vol1.difftar.gpg

    what do I have to do on (Do not enable website hosting)

  7. Hi Rainerr, it looks like you don’t have correctly configured the option S3FILESYSLOCATION with something like S3FILESYSLOCATION=”s3+http://your-bucket-name”.

    Could you send me your bart logs and configuration file to my mail address? toni A.T blyx D.ot com, I will take care of this.

    Thanks.

  8. Again me, I really use this tool this why I come up with this amount of questions ….

    I installed the version on one machine running like expected on S3.

    Now I created another machine to see how to restore the backup rsync the /opt (alfresco core) and /root/.gnupg

    now I run ./alfresco-bart.sh collection all (on the new machine) and I get
    ./alfresco-bart.sh: line 555: collection-status: command not found

    I run the same command on the old machine and worked like expected …..

    the /opt/alfresco/script is rsync, they are igual ……

    What did I miss

  9. Hi Rainer, is great that you are using the tool, thanks!

    Did you install duplicity and its dependences in the target server? I know, it’s so obvious but just in case…

  10. yo got it …. don’t installed the dependencies by the way this running fine on ubuntum:

    apt-get install duplicity
    apt-get install postgresql-client-common
    apt-get install postgres-xc-client
    apt-get install python-pip
    pip install -U boto

  11. maybe one idea Toni to have a basic restore (just in time) on button no question and an advanced like the restore wizard

  12. Ok, I got it, something like “./alfresco-bart.sh restore now” and then, restore last full backup in a temp dir?

  13. Again me ….

    I have all the parts but there is no README and I can’t find on google how to restore the postgresql

  14. Nearly finished but !!!!

    the file copy on the restore run ok, when I try to restore de dump I get a bunch of errors like:
    ERROR: constraint “fk_avm_ml_to” for relation “avm_merge_links” already exists

    if i try a new database like alfresco_1 y got FATAL: database “alfresco_1” does not exist

    what’s absolutly right, I’m stucked how to restore the database …..

  15. Hi Rainer,

    You need to restore the original database, make a backup of the existing one, delete it, create an empty db with same name and import the backup.

    I will add this to the documentation, sorry about that.

  16. Toni, te quería agradecer por la herramienta funciona de maravillas en un CENTOS 6.4.

    El único lío que tuve fue que en el crontab tuve que agregar :

    HOME=/root
    export PATH HOME

    Ya que sin eso me daba el siguiente error:

    BackendException: ssh connection to [email protected]:22 failed: EOF when reading a line

    Saludos,
    4lfr3d7115

  17. Si estoy usando el backup por SSH.

    Otra consulta Toni quiero levantar alfresco nuevamente con los archivos full del backup sacado con BART, decía que estaba las instrucciones en el README, pero no hay nada referente al tema.

  18. Pero básicamente puedes ver los ficheros de los índices, DB, contentstore y configuración en el directorio de recovery y debes colocarlos en su sitio. Restaurar la base de datos y poco más.

  19. Toni, e intentado restaurar el servidor con los archivos del BART y me sale el siguiente error:

    SEVERE: Exception fixing docBase for context [/solr]
    java.io.FileNotFoundException: /opt/alfresco-4.2.e/alf_data/solr/apache-solr-1.4.1.war (No such file or directory)

    SEVERE: Error starting static Resources
    java.lang.IllegalArgumentException: Invalid or unreadable WAR file : /opt/alfresco-4.2.e/alf_data/solr/apache-solr-1.4.1.war

    He revisado el backup del archivo sol y no hay este archivo que me pide al restaurar “apache-solr-1.4.1.war “

  20. Coorecto Toni, en definitiva copie toda la carpeta solr de una backup previo que tenia y subió todo correctamente sin errores.

    Seria conveniente que en la nueva versión se pueda corregir ese tema.
    Gracias.

  21. Hello Toni,

    I would like to thank you for having created Alfresco BART, it’s a fantastic tool and definitely a must have. Being able to recover data is very important for me. I’m therefore testing all the disaster scenarios to ensure the data will be restorable quickly regardless of the disaster type.

    I have few questions. I installed Alfresco community 4.2.e with the linux installer (alfresco-community-4.2.e-installer-linux-x64.bin) on a fresh Ubuntu 12.04 server installation. Can Alfresco BART version 0.2 be used for cold backups? I’m asking this because for a cold backup the Alfresco services and the postgre database need to be stopped. When doing a backup with everything stopped I get the following error for the database backup.

    sudo ./alfresco-bart.sh backup db
    pg_dump.bin: [archiver (db)] connection to database “alfresco” failed: could not connect to server: Connection refused
    Is the server running on host “localhost” (127.0.0.1) and accepting
    TCP/IP connections on port 5432?

    Despite this error message, duplicity creates files but they are empty, they do not contain the database.

    I didn’t read anywhere in the BART documentation that it’s only for hot backup. I didn’t find a way to initiate BART to do a cold backup.
    Anyway, by reading your Alfresco Backup and Disaster Recovery – White Paper I successfully restored a cold backup by just coping the alf_data directory. It would be great if BART would support the cold backups.

    Is there a typo at line 193 of alfresco-bart.properties? I believe it should be psql instead of pgsql. Also what this line is used for?

    I also tested the hot backup and restore. I had to read your and all comments on this page to realize that I have to delete the alfresco database, create a new one and then import. Not doing this was giving me lots of error during the db dump import. Now I’m puzzled with the solr backup. When I list the files under the solr restore directory I see two directories (backup and config). Do I erase completely the content of /opt/alfresco-4.2.e/alf_data/solr and then cp the content from the restore_dir/solr/config back to /opt/alfresco-4.2.e/alf_data/solr? Also, what do I have to do with restore_dir/solr/backup directory which contains other directories which contain snapshot files?

    It would be good to write a more detailed procedure for the hot backup restore, or at least write that we have to delete the db, create a new empty one and then restore the dump.

    Thank you again Toni for this great tool.

    Best regards
    Stéphane

  22. Hi Stephane,

    Thanks for your comments, really helpful for improving the documentation and the tool.

    Let me answer your questions below:

    -Alfresco BART was designed for hot backup or at least for a backup with the database up and running. I can consider to do a cold backup for cases as yours but I’m not sure if this can be included in next version. I will add a connection db validator, that may help.

    -Yes, it is a bug in line 193 (fixed) but it doesn’t matter right know because this variable is not used so far by the script. Thanks for letting me know!

    -I will add the db drop advise to the documentation and to the recovery in the script.

    -I definitely have to improve the documentation in the restore procedure for all sets. For Solr indexeds you have to copy the content of solrBackup/alfresco/snapshot.20131110020001 to solr/workspace/SpacesStore/index (with Alfresco stopped) then start Alfresco. Remember to do this also for archive.

    Thanks a lot for your feedback, it is really valuable to me.

    Cheers.

  23. Hi Tony,

    Thank you for your feedback and for having explained how to restore the solr index.

    Stéphane

  24. Hi Tony,

    great job! At the moment I’m testing BART before using it in productive system.

    Here two little things I’ve noticed:

    – It would be great to define duplicities archive-dir and tempdir in alfresco-bart.properties. Not defining it causes “No space left on device” if the root partition is to small for much data in alfresco 🙂

    – I had to comment out the following lines because duplicity could not exclude these directories because the are not located in ALF_INSTALLATION_DIR

    function filesBackup {
    # Getting a variable to know all includes and excludes
    FILES_DIR_INCLUDES=”$ALF_INSTALLATION_DIR”

    # Commented out because of errors during backup – JZO 25.02.2014
    #
    # if [ -d “$INDEXES_BACKUP_DIR” ]; then
    # OPT_INDEXES_BACKUP_DIR=” –exclude $INDEXES_BACKUP_DIR”
    # fi
    # if [ -d “$INDEXES_DIR” ]; then
    # OPT_INDEXES_DIR=” –exclude $INDEXES_DIR”
    # fi
    # if [ -d “$ALF_CONTENTSTORE” ]; then
    # OPT_ALF_CONTENTSTORE=” –exclude $ALF_CONTENTSTORE”
    # fi
    # if [ -d ${ALF_DIRROOT}/contentstore.deleted ]; then
    # OPT_ALF_CONTENSTORE_DELETED=” –exclude ${ALF_DIRROOT}/contentstore.deleted”
    # fi
    # if [ -d “$ALF_CACHED_CONTENTSTORE” ]; then
    # OPT_CACHED_CONTENTSTORE=” –exclude $CACHED_CONTENTSTORE”
    # fi
    # if [ -d “$ALF_CONTENTSTORE2″ ]; then
    # OPT_ALF_CONTENTSTORE2=” –exclude $ALF_CONTENTSTORE2″
    # fi
    # if [ -d “$ALF_CONTENTSTORE3″ ]; then
    # OPT_ALF_CONTENTSTORE3=” –exclude $ALF_CONTENTSTORE3″
    # fi
    # if [ -d “$ALF_CONTENTSTORE4″ ]; then
    # OPT_ALF_CONTENTSTORE4=” –exclude $ALF_CONTENTSTORE4″
    # fi
    # if [ -d “$ALF_CONTENTSTORE5″ ]; then
    # OPT_ALF_CONTENTSTORE5=” –exclude $ALF_CONTENTSTORE5″
    # fi
    # if [ -d “$LOCAL_BACKUP_DB_DIR” ]; then
    # OPT_LOCAL_BACKUP_DB_DIR=” –exclude $LOCAL_BACKUP_DB_DIR”
    # fi

    It would be great to have the 1-File-Restore for PostgreSQL databases too 🙂

    Best regards,
    Juliane

  25. Hi Juliane, thank you very much for your comments, I will take them on mind for the next release. Of course I would like to add the single file restore for postgres and oracle asap.

  26. Hola Toni, gusto en saludarte.

    Quisiera preguntarte como se puede utilizar este utilitario, pero en sistema operativo Windows y base de datos postgresql ?

    Mil gracias de antemano

  27. Hola Emerson, no tengo pensado hacerlo para windows a corto plazo, salvo que uses cygwin, igual funciona si tienes ese emulador en el servidor windows. Saludos.

  28. Mil gracias, probaré el Cygwin.
    Y aprobecho para felicitarte por este gran proyecto que estás llevando a cabo.
    Saludos desde Panamá

  29. Hola Toni,

    Juliane te comento esto anteriormente

    ” It would be great to define duplicities archive-dir and tempdir in alfresco-bart.properties. Not defining it causes “No space left on device” if the root partition is to small for much data in alfresco :)”

    Me ocurre lo mismo tengo un espacio de 1G en el /tmp del servidor, de eso tengo libre +- 800MB en /tmp. He colocado VOLUME_SIZE=400 y el /tmp se queda sin espacio y da error en el proceso de backup por falta de espacio.

    Hay alguna manera de que los archivos temporales los envíe a otra partición donde tengo espacio.

  30. Hola Alfred, para resolver ese tienes dos opciones:
    1- reducir el tamaño del volumen a 10MB ya que está configurado como async upload, es decir que mientras sube un volumen está generando el otro así que necesita más espacio.
    2- eliminar la opción “–asynchronous-upload” de GLOBAL_DUPLICITY_PARMS en alfresco-bart.properties. En ese caso, generará un volumen y lo subirá al destino, luego generará el siguiente y lo subirá, y así sucesivamente de forma que nunca gastará más del tamaño de 1 volumen como espacio temporal.

    Está configurado con la opción async upload por defecto porque es más rápido en teoría, pero no pensé en un caso como el tuyo.

    Saludos y gracias por reportar tu problema.

  31. Gracias Toni, tomare la opción 2 , la opción 1 ya lo tenia trabajando adecuadamente con el valor por defecto de 25.

    Toni una cosa mas, cuando se establece FULLDAYS=30D, y si se realiza el primer full backup el día 5 abril quiere decir que el siguiente full backup lo realiza el 5 mayo? No hay como configurarlo para que realize los full backups siguientes el primer día de cada mes? Independientemente del primer full backup que obviamente lo debe hacer cuando corramos por primera vez BART.

  32. Hi Toni, I just started looking into Alfresco backup. Some questions, does this code work for both Alfresco community and enterprise (Alfresco One)? Also, I am trying to understand what to backup, the solr, the metadata table(the database), is content store where the actual data is stored? What’s the relationship between content store and file system? If I just want a tape backup of the actual data, am I simply backing up a snapshot of all data in the file system? If these questions are too basic for this context, any hints where I should find answers? Thank you!

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.