Alfresco, NAS or SAN, that’s the question!

The main requirement on the shared storage is being able to cross-mount the storage between the Alfresco servers. Whether this is done via an NAS or SAN is partly a decision around which technology your organization’s IT department can best support. Faster storage will have positive implications on the performance of the system, with Alfresco recommending throughput at 200 MB/sec.

NAS allows us to mount the content store via NFS or CIFS on all Alfresco servers, and they are able to read/write the same file system at the same time. The only real requirement is that the OS on which Alfresco is installed supports NFS (which is any Linux box actually). NFS tends to be cheaper and easier, but is not the fastest option. It is typically sufficient, though.

SAN is typically faster and more reliable, but obviously more expensive and complex (dedicated hardware and configuration requirements). In order to read/write from all Alfresco servers from/to the SAN, special file system types are necessary. For Red Hat, we use GFS2, other Linux flavors use OCFS or many others.

You are maybe thinking what happen in case of having multiple Alfresco servers writing to the same LUN could result in corruption (especially in header files), so it sounds like NAS (NFS/CIFS) would take care of that issue, however, if using a SAN, the filesystem must be managed properly to allow for read/write from multiple servers. For the Alfresco stand point, you don’t have to take care of that in both SAN or NAS approaches because Alfresco manages the I/O such that no collisions or corruption occur.

Note: If using a SAN, ensure the file system is managed properly to allow for read/write from multiple servers.

I also wanted to share this presentation I did internally some time ago but I think it would be useful.

Leer Más

Screencast: Alfresco One AOS (Alfresco Office Services)

Alfresco Office Services is the new implementation made for Alfresco One (former Enterprise) which allows an user to “Edit Online” a document with MS Office straight from Alfresco Share, provides a fully-compatible SharePoint repository. This new implementation replaces the existing VTI (Microsoft Office SharePoint Protocol Support) already in Alfresco Community.

With Alfresco Office Services (AOS) you can access Alfresco directly from your Microsoft Office applications.  This means that you can browse, open, and save Microsoft Office files (Word, PowerPoint, and Excel) in Alfresco without the need to access Alfresco through Chrome, Firefox, or another web browser. (See oficial documentation here). With AOS you also can connect Alfresco as a network drive or shared folder.
The main differences between this new AOS and the existing VTI are:
  • Removed Jetty embedded server therefore not need to use port 7070.
  • Select document type when saving document form MS Office to Alfresco and fill the type properties within MS Office.
  • AOS is part of the core, not need to install an additional AMP.
  • A new ROOT.war and _vti_bin.war have to be deployed (included in Alfresco One), if you are upgrading from previous versions please check this information.
Here is a quick demo about how it works and how to use it:

Leer Más

Buenas Prácticas de Seguridad en Docker

DockerLogoNota: Este artículo lo escribí para SecurityByDefault.com el 18/05/2015, espero que lo disfrutéis.
Docker es una plataforma abierta que permite construir, portar y ejecutar aplicaciones distribuidas, se basa en contenedores que corren en Linux y funcionan tanto en máquinas físicas como virtuales simplemente usando un runtime. Está escrito en Go y usa librerías del sistema operativo así como funcionalidades del kernel de Linux en el que se ejecuta. Consta de un engine con API RESTful y un cliente que pueden ejecutarse en la misma máquina o en máquinas separadas. Es Open Source (Apache 2.0) y gratuito.
Los contenedores existen desde hace muchos años, Docker no ha inventado nada en ese sentido, o casi nada, pero no hay que quitarles mérito, están en el momento adecuado y aportan las características y herramientas concretas que se necesitan en la actualidad, donde la portabilidad, escalabilidad, alta disponibilidad y los microservicios en aplicaciones distribuidas son cada vez más utilizados, y no sólo eso, sino que también son mejor entendidos por la comunidad de desarrolladores y administradores de sistemas. Cada vez se desarrollan menos aplicaciones monolíticas y más basadas en módulos o en microservicios, que permiten un desarrollo más ágil, rápido y a la vez portable. Empresas de sobra conocidas como Netflix, Spotify o Google e infinidad de Start ups usan arquitecturas basadas en microservicios en muchos de los servicios que ofrecen.
Te estarás preguntando ¿Y no es más o menos lo mismo que hacer un chroot de una aplicación? Sería como comparar una rueda con un coche. El concepto de chroot es similar ya que se trata de aislar una aplicación, pero Docker va mucho más allá, sería un chroot con esteroides, muchos esteroides. Por ejemplo, puede limitar y controlar los recursos a los que accede la aplicación en el contenedor, generalmente usan su propio sistema de archivos como UnionFS o variantes como AUFS, btrfs, vfs, Overlayfs o Device Mapper que básicamente son sistemas de ficheros en capas. La forma de controlar los recursos y capacidades que hereda del host es mediante namespaces y cgroups de Linux. Esas opciones de Linux no son nuevas en absoluto, pero Docker lo hace fácil y el ecosistema que hay alrededor lo ha hecho tan utilizado.
Adicionalmente, la flexibilidad, comodidad y ahorro de recursos de un contenedor es mayor a la que aporta una máquina virtual o un servidor físico, esto es así en muchos casos de uso, no en todos. Por ejemplo, tres servidores web para un cluster con Nginx en una VM con una instalación de Linux CentOS mínima ocuparía unos 400MB, multiplicado por 3 máquinas sería total de uso en disco de 1,2 GB, con contenedores serían 400MB las mismas 3 máquinas corriendo ya que usa la misma imagen para múltiples contenedores. Eso es sólo por destacar una característica interesante a nivel de recursos. Otro uso muy común de Docker es la portabilidad de aplicaciones, imagina una aplicación que solo funciona con Python 3.4 y hacerla funcionar en un sistema Linux con Python 2.x es complicado, piensa en lo que puede suponer en un sistema en producción actualizar Python, con contenedores sería casi automático, descargar la imagen del contenedor y ejecutar la aplicación de turno.
Solo por ponernos en situación de la envergadura Docker, unos números alrededor del producto y la compañía (fuente aquí):
  • 95 millones de dólares de inversión.
  • Valorada en 1.000 millones de dólares.
  • Más de 300 millones de descargas en 96 releases desde marzo de 2013
Pero un contenedor no es para todo, ni hay que volverse loco “dockerizando” cualquier cosa, aunque no es este el sitio para esa reflexión. Al cambiar la forma de desarrollar, desplegar y mantener aplicaciones, también cambia en cierto modo la forma de securizar estos nuevos actores.
Docker aporta seguridad en capas, aísla aplicaciones entre ellas y del host sin usar grandes recursos, también se pueden desplegar contenedores en máquinas virtuales lo que aporta otra capa adicional de aislamiento (estaréis pensando en VENOM pero eso es otra película que no afecta directamente a Docker). Dada la arquitectura de Docker y usando buenas prácticas, aplicar parches de seguridad al anfitrión o a aplicaciones suele ser más rápido y menos doloroso.
Buenas Prácticas de Seguridad:
Aunque la seguridad es algo innato en un contenedor, desde Docker Inc. están haciendo esfuerzos por la seguridad, por ejemplo, contrataron hace unos meses a ingenieros de seguridad de Square, que no son precisamente nuevos en el tema. Ellos, junto a compañías como VMware entre otras, han publicado recientemente un extenso informe de sobre buenas prácticas de seguridad en Docker en el CIS. Gracias a este informe tenemos acceso a más de 90 recomendaciones de seguridad a tener siempre en cuenta cuando vamos a usar Docker en producción. En la siguiente tabla podemos ver las recomendaciones de seguridad sugeridas, algunas son muy obvias pero un check list así nunca viene mal:
1. Recomendaciones a nivel de host
1.1. Crear una partición separada para los contenedores 
1.2. Usar un Kernel de Linux actualizado 
1.3. No usar herramientas de desarrollo en producción
1.4. Securizar el sistema anfitrión 
1.5. Borrar todos los servicios no esenciales en el sistema anfitrión
1.6. Mantener Docker actualizado 
1.7. Permitir solo a los usuarios autorizados controlar el demonio Docker
1.8. Auditar el demonio Docker  (auditd)
1.9. Auditar el fichero o directorio de Docker – /var/lib/docker 
1.10. Auditar el fichero o directorio de Docker – /etc/docker 
1.11. Auditar el fichero o directorio de Docker – docker-registry.service 
1.12. Auditar el fichero o directorio de Docker – docker.service 
1.13. Auditar el fichero o directorio de Docker – /var/run/docker.sock 
1.14. Auditar el fichero o directorio de Docker – /etc/sysconfig/docker 
1.15. Auditar el fichero o directorio de Docker – /etc/sysconfig/docker-network 
1.16. Auditar el fichero o directorio de Docker – /etc/sysconfig/docker-registry 
1.17. Auditar el fichero o directorio de Docker – /etc/sysconfig/docker-storage 
1.18. Auditar el fichero o directorio de Docker – /etc/default/docker 
 
2. Recomendaciones a nivel de Docker Engine (daemon)
2.1 No usar el driver obsoleto de ejecución de lxc 
2.2 Restringir el tráfico de red entre contenedores 
2.3 Configurar el nivel de logging deseado 
2.4 Permitir a Docker hacer cambios en iptables 
2.5 No usar registros inseguros (sin TLS)
2.6 Configurar un registro espejo local
2.7 No usar aufs como driver de almacenamiento
2.8 No arrancar Docker para escuchar a  una IP/Port o Unix socket diferente
2.9 Configurar autenticación TLS para el daemon de Docker
2.10 Configurar el ulimit por defecto de forma apropiada

3. Recomendaciones a nivel de configuración de Docker
3.1 Verificar que los permisos del archivo docker.service están como root:root 
3.2 Verificar que los permisos del archivo docker.service están en 644 o más restringidos 
3.3 Verificar que los permisos del archivo docker-registry.service están como root:root 
3.4 Verificar que los permisos del archivo docker-registry.service están en 644 o más restringidos
3.5 Verificar que los permisos del archivo docker.socket están como root:root 
3.6 Verificar que los permisos del archivo docker.socket están en 644 o más restringidos
3.7  Verificar que los permisos del archivo de entorno Docker (/etc/sysconfig/docker o /etc/default/docker) están como root:root 
3.8 Verificar que los permisos del archivo de entorno Docker (/etc/sysconfig/docker o /etc/default/docker) están en 644 o más restringidos
3.9 Verificar que los permisos del archivo /etc/sysconfig/docker-network (si se usa systemd) están como root:root 
3.10 Verificar que los permisos del archivo /etc/sysconfig/docker-network están en 644 o más restringidos
3.11  Verificar que los permisos del archivo /etc/sysconfig/docker-registry (si se usa systemd) están como root:root
3.12 Verificar que los permisos del archivo /etc/sysconfig/docker-registry (si se usa systemd) están en 644 o más restringidos
3.13 Verificar que los permisos del archivo /etc/sysconfig/docker-storage (si se usa systemd) están como root:root 
3.14 Verificar que los permisos del archivo /etc/sysconfig/docker-storage (si se usa systemd) están en 644 o más restringidos 
3.15 Verificar que los permisos del directorio /etc/docker están como root:root 
3.16 Verificar que los permisos del directorio /etc/docker están en 755 o más restrictivos 
3.17 Verificar que los permisos del certificado del registry están como root:root 
3.18 Verificar que los permisos del certificado del registry están en 444 o más restringidos 
3.19 Verificar que los permisos del certificado TLS CA están como root:root 
3.20 Verificar que los permisos del certificado TLS CA están en 444 o más restringidos 
3.21 Verificar que los permisos del certificado del servidor Docker están como root:root 
3.22 Verificar que los permisos del certificado del servidor Docker están en 444 o más restringidos 
3.23 Verificar que los permisos del archivo de clave del certificado del servidor Docker están como root:root 
3.24 Verificar que los permisos del archivo de clave del certificado del servidor Docker están en 400 
3.25 Verificar que los permisos del archivo de socket de Docker están como root:docker 
3.26 Verificar que los permisos del archivo de socket de Docker están en 660 o más restringidos 
 
4 Imágenes de Contenedores y Dockerfiles
4.1 Crean un usuario para el contenedor
4.2 Usar imágenes de confianza para los contenedores 
4.3 No instalar paquetes innecesarios en el contenedor
4.4 Regenerar las imágenes si es necesario con parches de seguridad
 
5 Runtime del contenedor
5.1 Verificar el perfil de AppArmor (Debian o Ubuntu) 
5.2 Verificar las opciones de seguridad de SELinux (RedHat, CentOS o Fedora) 
5.3 Verificar que los contenedores esten ejecutando un solo proceso principal
5.4 Restringir las Linux Kernel Capabilities dentro de los contenedores 
5.5 No usar contenedores con privilegios   
5.6 No montar directorios sensibles del anfitrión en los contenedores
5.7 No ejecutar ssh dentro de los contenedores
5.8 No mapear puertos privilegiados dentro de los contenedores
5.9 Abrir solo los puertos necesarios en un contenedor
5.10 No usar el modo “host network” en un contenedor 
5.11 Limitar el uso de memoria por contenedor 
5.12 Configurar la prioridad de uso de CPU apropiadamente 
5.13 Montar el sistema de ficheros raíz de un contenedor como solo lectura
5.14 Limitar el tráfico entrante al contenedor mediante una interfaz específica del anfitrión
5.15 Configurar la política de reinicio ‘on-failure’ de un contenedor a 5 
5.16 No compartir PID de procesos del anfitrión con contenedores
5.17 No compartir IPC del anfitrión con contenedores 
5.18 No exponer directamente dispositivos del anfitrión en contenedores
5.19 Sobre-escribir el ulimit por defecto en tiempo de ejecución solo si es necesario
 
6 Operaciones de Seguridad en Docker
6.1 Realizar auditorías de seguridad tanto en el anfitrión como en los contenedores de forma regular
6.2 Monitorizar el uso, rendimiento y métricas de los contenedores
6.3 Endpoint protection platform (EPP) para contenedores (si las hubiese) 
6.4 Hacer Backup de los datos del contenedor 
6.5 Usar un servicio centralizado y remoto para recolección de logs
6.6 Evita almacenar imágenes obsoletas, sin etiquetar correctamente o de forma masiva.   
6.7 Evita almacenar contenedores obsoletos, sin etiquetar correctamente o de forma masiva.
En algunos casos, hay recomendaciones que merecen un artículo por si solas. Si quieres profundizar más en este tema recuerda que los pormenores de estos aspectos de seguridad y auditoría los ampliaremos durante el curso online Hardening de Windows, Linux e Infraestructuras” en el que colaboraré junto a Lorenzo Martínez, Yago Jesús, Juan Garrido y Pedro Sanchez, todo un lujo de curso en el que aportaré mi granito de arena con seguridad en Docker completando el módulo de Hardening Linux. Más información aquí: https://www.securizame.com/curso-online-de-hardening-de-sistemas-windows-y-linux-e-infraestructuras_yj/
Para otros posibles artículos en el futuro me parece interesante ver algunas consideraciones de seguridad en Docker Hub y otros componentes relacionados, así como auditorías de contenedores con Lynis.
Recursos y referencias:

 

Leer Más

Book review: “Learning Alfresco Web Scripts”

Recently PackPublishing has released  the book “Learning Alfresco Web Scripts” written by Ramesh Chauhan.https://www.packtpub.com/web-development/learning-alfresco-web-scripts In a nutshell it is an starting point to learn how to develop web scripts from scratch to success.

If you often read this blog, you may already know what Alfresco is and how it works. As per the Alfresco Wiki: A Web Script is simply a service bound to a URI which responds to HTTP methods such as GET, POST, PUT and DELETE. While using the same underlying code, there are broadly two kinds of Web Scripts: data and presentation Web Scripts.

The book shows the reader what to know to be a web script developer: understand the Alfresco web script framework and how it works, components and architecture, writing a web script from scratch, types and options of web scripts with its components, how to use them from third party applications (which is very interesting in order to integrate Alfresco with others), embed Java in Web Scripts also knows as Java-backed web scripts, using Web Scripts with Java-script as well. Get to know all deployment options, debugging and troubleshooting, and also the very important maven options available with web scripts deployments.

I liked this book because it goes from very foundational information to really deep level concepts, so if you are looking to start learning web scripts from scratch and go beyond, it is a good option to have a single point of consultation. This is a pure web scripts book, if you are looking for a 5.0 updated book this is not your book, because it doesn’t cover Aikau, but remember that it covers most importan topics to start working with different flavours of web scripts. And after all, it is oriented for both beginners and advanced developers.

 

Leer Más

Alfresco 5.0 and Liferay 6.2 CMIS integration

It is as easy as it sounds:
  1. Use same user and password in both servers, this can be done by having SSO or same LDAP on both applications. Also just use same user and password for both even with their internal DB.
  2. Add the two properties below to your Liferay configuration file and restart Liferay:
    $ vi /opt/liferay-6.2-6/apache-tomcat/webapps/liferay/WEB-INF/classes/portal-ext.properties
    session.store.password=true
    company.security.auth.type=screenName
    
  3. 3rd: watch this 5 minutes screencast:

Thanks to my friends of Gobal Quark for the tips.

Leer Más

Alfresco Tuning Shortlist

During last few years, I have seen dozens of Alfresco installations in production without any kind of tuning. That makes me thing that 1) nobody cares about performance or 2) nobody cares  about documentation or 3) both of them!
I know people prefer to read a blog post instead the product official documentation. Since Alfresco have improved A LOT our official documentation and most of the information provided below can be found there, I want to point out some tips that EVERYONE has to take into account before going live with your Alfresco environment. Remember, it’s easy Tuning = Live, No Tuning = Dead.
Tuning the Alfresco side:
  • Increase number of concurrent connections to the DB in alfresco-global.properties
# Number below has to be the maxThreads value + 75
db.pool.max=275
  • Increase number of threads that Tomcat will use in server.xml – section 8080, 8443 and 8009 in case you use AJP
maxThreads=“200”
  • Adjust the amount of memory you want to assign to Alfresco in setenv.sh or ctl.sh (which is the default one):
export CATALINA_OPTS=" -Xmx=16G -Xms=16G"
in JAVA_OPTS make sure you have the flag “-server” that gives 1/3 of memory for new objects, do not use “XX:NewSize=” unless you know what you are doing, Solr takes many new objects and it will need more than 1G in production.
ooo.enabled=false
jodconverter.enabled=true
Tuning the Solr side:
In solrcore.properties for both workspace and archive Spaces Store
alfresco.batch.count=2000
solr.filterCache.size=64
solr.filterCache.initialSize=64
solr.queryResultCache.size=1024
solr.queryResultCache.initialSize=1024
solr.documentCache.size=64
solr.documentCache.initialSize=64
solr.queryResultMaxDocsCached=2000
solr.authorityCache.size=64
solr.authorityCache.initialSize=64
solr.pathCache.size=64
solr.pathCache.initialSize=64
In solrconfig.xml for both workspace and archive Spaces Store 
mergeFactor change it to 25
ramBufferSizeMB change it to 64

April/9/2015 Update! For Solr4 (Alfresco 5.x) add next options to its JVM startup options:

-XX:+UseConcMarkSweepGC -XX:+UseParNewGC
Tuning the DB side:
Max allowed connections, adjust that value to the total amount of your Alfresco or Alfrescos plus 200, consider increase it in case you use that DB for other than only Alfresco.
  • For MySQL in my.cnf configuration file:
innodb_buffer_pool_size = 4GB
max_connections=600
innodb_log_buffer_size=50331648
innodb_log_file_size=31457280
innodb_flush_neighbors=0
  • For Postgres in postgresql.conf configuration file
max_connections = 600
Do maintenance on your DB often. Run ANALYZE or VACCUM (MySQL or Postgres), a DB also needs love!
Tuning the OS side:
I’m not very good on Windows so I will cover only a few tips for Linux:
  • Change limits in /etc/security/limits.conf to the user who is running your app server, for example “tomcat”:
tomcat soft nofile 4096
tomcat hard nofile 65535

If you start Alfresco with a su -c option in /etc/init.d/, for Ubuntu you have to uncomment the pam_limits.so line here /etc/pam.d/su, if this is using login (by ssh) it is uncommented by default. For RedHat/Centos this line has to be uncommented here /etc/pam.d/system-auth.

  • Your storage throughput should be greater than 200 MB/sec and this can be checked by:
# hdparm -t /dev/sda
/dev/sda:
Timing buffered disk reads: 390 MB in  3.00 seconds = 129.85 MB/sec
  • Allow more concurrent requests by editing /etc/sysctl.conf
net.core.somaxconn = 65535
net.ipv4.tcp_max_syn_backlog = 65535
net.ipv4.ip_local_port_range = 2048 64512
net.ipv4.tcp_tw_recycle = 1
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_fin_timeout = 10
Run “sysctl -p” in order to reload changes.
  • A server full reboot is a good preventive measure before going live, it should start all needed services in case of contingency and we will find if we left something back on the configuration.
Remember, this is ONLY A SHORTLIST, you can do much more depending on your use case. Reading the documentation and taking our official training will be helpful and take advantege that we were polishing our training materials lately.

Leer Más

Understanding Alfresco Content Deletion

As part of the work I’m doing for the upcoming Alfresco Summit, where I will be talking about my favorite topic: “Security and Alfresco”, I have written a few lines about Alfresco node deletion, how it works and why is important to take it into account in terms of security control.
I just wanted to clarify how Alfresco works when a content item is deleted and also how content deletion works in Records Management (RM). Basic content deletion is already very well explained in this Ixxus blog post but there are some differences in the database schema between Alfresco 4.1 and 4.2 worth noting, such as the alf_node table has a field named ‘node_deleted in versions 4.0 and earlier.
To develop a deep knowledge about Alfresco security and also how to configure Alfresco backup and disaster recovery, you should first need to understand how the Alfresco repository manages the lifecycle of a content item.
Node creation:
When a node is created,regardless how it is uploaded or created in Alfresco (via the API, web UI, FTP, CIFS, etc.)Alfresco will do the following:

  1. Metadata properties are stored into the Database in the logical store workspace://SpacesStore (alf_node, alf_content_url among others).
  2. The file itself is store and renamed as .bin under alf_data/contentstore/YYYY/MM/DD/hh/mm/url-id-of-the-file.bin
  3. Next, depending on your indexing you chose, its index entries are created within Lucene (alf_data/lucene-indexes/workspace/SpacesStore) or Solr (alf_data/solr/workspace/SpacesStore).
  4. Finally, in most cases, a content thumbnail is created as a child of the file created.

Node deletion:
There are two phases to node deletion:
Phase 1- A user or admin deletes a content item (sending it to the trashcan):

  1. When someone deletes a content item, the content and its children (eg. thumbnails) are moved (archived) within  the DB from workspace://SpacesStore to archive://SpacesStore. Nothing else happens in the DB.
  2. The actual content “.bin” file remains in the same location inside the contentstore directory.
  3. Finally,the indexes are moved from the existing location to the corresponding archive alf_data/lucene-indexes/archive/SpacesStore) or Solr (alf_data/solr/archive/SpacesStore) depending on your index engine selection.

NOTE: A deleted node stays in the trashcan FOREVER, unless the user or admin either empties the trashcan or recovers the file. This default” behavior can be changed by using third party modules that empty the trashcan automatically on a custom schedule. See below for more information on these modules.
The trashcan may be found at these locations:
 Alfresco Share: User -> My Profile -> Trashcan (admin user will see all users deleted files, since 4.2 all users can also see and restore their own deleted files).
Alfresco Explorer: User Profile -> Manage Deleted Items (for all users).
Phase 2- Any user or admin (or trashcan cleaner) empties the trashcan:
That means the content is marked as an “orphan” and after a pre-determined amount of time elapses, the orphaned content item ris moved from the alf_data/contentstore directory to alf_data/contentstore.deleted directory.
Internally at DB level a timestamp (unix format) is added to alf_content_url.orphan_time field where an internal process called contentStoreCleanerJobDetail will check how many long the content has been orphaned.,f it is more than 14 days old, (system.content.orphanProtectDays option) .bin file is moved to contentstore.deleted. Finally, another process will purge all of its references in the database by running nodeServiceCleanupJobDetail and once the index knows the node has bean removed, the indexes will be purged as well.
NOTE: Alfresco will never delete content in alf_data/contentstore.deleted folder. It has to be deleted manually or by a scheduled job configured by the system administrator.
By default, the contentStoreCleanerJobDetail runs every day at 4AM by checking how the age of an orphan node and if it exceeds system.content.orphanProtectDays (14 days) it is moved to contentstore.deleted.
Additionally, the nodeServiceCleanupJobDetail runs every day at 9PM and purges information related to deleted nodes from the database.
Now, that we understand how Alfresco works by default, let’s learn how to modify Alfresco’s behavior in order to clean the trashcan automatically:
There are several third party modules to achieve this, but I recommend the Alfresco Trashcan Cleaner by Alfresco’s very own Rui Fernandes. Tt can be found at https://code.google.com/p/alfresco-trashcan-cleaner/.
Once the amp is installed, you can use this sample configuration  by copying it to alfresco-global.properties:

trashcan.cron=0 30 * * * ?
trashcan.daysToKeep=7
trashcan.deleteBatchCount=1000

The options above configure the cleaner to run every hour at thethe half hour and it will remove content from the trashcan and mark them as orphan if a content has been in the trashcan for more than 7 days. It will do this in batches of 1000 deletions every time it runs. To delete from the trashcan without waiting any grace period set the trashcan.daysToKeep property value to -1.
Can I configure Alfresco to avoid using contentstore.deleted and ensure it really deletes a file after the trashcan is cleaned?
Yes, this is possible by setting system.content.eagerOrphanCleanup=true in alfresco-global.properties and once the trashcan is emptied, the file will not be moved to contentstore.deleted but it will be deleted from the file system (contentstore). After that, nodeServiceCleanupJobDetail will purge any related information from the database. Using sys:temporary aspect it also perform same behavior.
So, what is the recommended configuration for a production server?
This is something you have to figure out based on your backup and disaster recovery strategy. See my  Alfresco Summit presentation and white paper here: http://blyx.com/2013/12/04/my-talk-about-alfresco-backup-and-recovery-tool-in-the-alfresco-summit/.
If you have a proper l backup strategy, you can offer your users a grace period of 30 days to recover their own deleted documents from the trashcan and after the grace period delete them simultaneously from the trashcan and the filesystem. This can be achieved by installing the previously mentioned trashcan-cleaner and with this configuration in alfresco-global.properties:

system.content.eagerOrphanCleanup=true
trashcan.cron=0 30 * * * ?
trashcan.daysToKeep=30
trashcan.deleteBatchCount=1000

And what about Alfresco Records Management, does it work in the same way? How a record destruction works?
In the Records Management world you don’t tend to delete documents as often it is done in Document Management. When a content item is deleted from the RM file plan, it is considered to be a regular delete operation. This is rarely used and only done by RM admins when there is some justifiable reason such as correcting  a mistake that requires a record to be removed.
The only difference is that the deleted record by-passes the archive store, hence it never goes to the trashcan, it is marked as orphan once it is deleted. Then it will be moved to contentstore.deleted after orphanProtectDays or it is truly deleted if eagerOrphanCleanup is set as true.
Destruction of a record works in the same way that a record is removed, this will by-pass the archive and immediately trigger the clean-up (eagerOrphanCleanup) process so the content does not stay in the file system contentstore or contentstore.deleted.
As far as the meta-data goes, there are two options; the first is that all the meta-data (and hence the node itself) are completely deleted, the alternative method cleans out all the content but the node remains with only the meta-data (called ghosting). In Alfresco RM versions before 2.2 this was a global configuration value (rm.ghosting.enabled=true), in 2.2 it can be defined on the destroy step of the disposition schedule: “Maintain record metadata after destroy”.

Alfresco content deletion graph
Alfresco content deletion

Some final words on content deletion:
As we have seen, Alfresco offers different ways to delete content. It is important to remember, even if Alfresco completely deletes content such as when using the destroy option in RM or by using eagerOrphanCleanup, Alfresco will not wipe the removed content from the physical storage, it therefore can be recovered by file system recovery tools. Wiping a deleted content item may vary depending on multiple factors, since filesystem type to hardware configuration, etc. If you want to guarranty a real physical wipe of a file in your file system, a third party software must be used to “zero out” the corresponding disk sectors. The specific tools depend on the operating system type, hardware, etc.
Thanks to my colleagues at Alfresco Kevin Dorr, Roy Wetherall for the Records Management section and Luis Sala for the document syntax review.

Leer Más

Integration of IFTTT with Alfresco

If you are not aware about what IFTTT is, I recommend you to take a look in to this https://ifttt.com/wtf and then come back here to continue reading this blog post.

Here a brief demo about this integration, more details and configuration steps below.

Once you know what “if THIS then THAT” is, I want to explain how I have made a seamless integration with Alfresco using some very straightforward receipts and sending information to Alfresco in the THAT (action) part of its receipt.

Since there is not an Alfresco channel in IFTTT (yet), the data flow is from almost any channel to Alfresco using “Send an email from GMAIL” to Alfresco inbound email service (to a folder). I mean, this article is about how to send multiple kind of data from several IFTTT channels to Alfresco through the inbound email feature built in Alfresco.

In this screenshot you can see a self explained example:

Screen Shot 2014-06-09 at 11.50.25 AM

When I liked a picture in Instagram, it will be sent to Alfresco, once in Alfresco, we have a world of possibilities like transformations, workflows, publication, alerts, etc.

What do we need for having this working? Here you go a list of steps to get this ready to go:

1- Enable your Inbound Email service in Alfresco:
For Alfresco One 4.2 this is very easy by using the new Admin Console http://localhost:8080/alfresco/service/enterprise/admin/admin-inboundemail. Explanation below.
For Alfresco Community refer to here http://docs.alfresco.com/community/concepts/community-videos-12.html and here http://docs.alfresco.com/community/concepts/email-inboundsmtp-props.html

Screen Shot 2014-06-09 at 12.22.54 PM
As you can see in the screenshot above, I have made some changes to allow only emails from @blyx.com and from @alfresco.com, any one inside Alfresco and member of the EVERYONE group can send emails to a folder with an email alias aspect. My server is running in Linux and with a non-root user this is the reason I set port 1025, I have a port redirect to listen on port 25 from the internet. Examples of port redirect here http://docs.alfresco.com/community/tasks/fileserv-CIFS-useracc.html.

In the example I have created a folder called “Drafts” with the aspect Aliasable (Email):

Screen Shot 2014-06-09 at 12.16.49 PM

Edit this folder properties and add a new value for Alias property, in my case drafts which will be the email address alias of this folder, like drafts@alfresco.blyx.com (alias + @ + server FQDN). I don’t have to create a MX DNS record because I’m using the FQDN.

Screen Shot 2014-06-09 at 12.18.45 PM

Now, I’m ready to send an email from an existing Alfresco user  (and with permissions to create content) to Alfresco, in my case toni@blyx.com is the user toni in Alfresco.

2- Create an IFTTT receipt like showed in the video above.

3- Enjoy thousands of ways to add contents to your Alfresco!

Leer Más