Importing mailman archives into Drupal

My client wanted to be able to search their list manager archives (uses mailman) with Solr. We already had a pretty major investment in Drupal with about 80K PDF files. In the past, each of the different databases were managed by separate dtSearch indexes. With the new, Drupal system, we are now able to consolidate everything into one master index. With the special ‘faceting’ that is provided within Solr/Drupal, it becomes very easy to drill from the general request down to the specifics.

 

Well, this article is going to get a bit specific on the why and how of the integration we did between mailman data and Drupal.

 

Mailman keeps its archives in a directory structure that provides a single file
.mbox and a directory
. I selected the directory as my driver for getting all the files across. After I got everything written, some more research indicated that I might have been better to use the
.mbox file, as this is ‘authoritative’ for each list that mailman handles. But, I have working code now, so I will live with this decision for the time being.

 

The general process is as follows:

 

A. One Time Procedures

  1. Create a directory under sites/default/files. I called mine mailman. This is where all the list subdirectories will live.
  2. Create a Content Type in Drupal using just Title and Body.
  3. Install my Python script in the directory from step A.1 above.
  4. Make sure that a current release of drush is installed
  5. Install my drush script in the sites/default directory

 

B. Repeat procedures

  1. Rsync all of your lists that you want from the mailman server to the A.1 directory
  2. Run the Python script. This will create a list of all the eligible archive files, and then call the drush script once for each file found.

 

The code can be found at http://github.com/worxco/drupal_mailman_import