Importing Outlook PST into Zimbra (how to import large TGZ files using Zimbra REST API)

In this article you will learn how to import emails and folders from Outlook PST files into Zimbra using the command line. This article is targeted to system administrators who are interested in automation of PST imports. Some knowledge of Bash and PowerShell scripting is recommended as you may want to change the scripts used in this article to fit your automation needs.

In addition this article will show you how to import large TGZ files into Zimbra avoiding common errors and pitfalls.

Import workflow

This article provides a two step approach to import PST files into Zimbra. Step one will be the conversion of the Outlook PST file into a Zimbra TGZ file.

This first step can be done on a Windows workstation with Zimbra Desktop installed or on a Linux machine with the command readpst installed.

It is recommended to use Windows/Zimbra Desktop as the first step as this leverages a commercially supported PST parser which may provide better Outlook/PST compatibility.

Please note that the conversion script does not require Zimbra Desktop to be configured to an account or Zimbra server. It just needs to be installed on your Windows machine so that the script can use Zimbra Desktop components.

The second step is always executed on the Zimbra Mailbox server.

Performing the PST to TGZ conversion Windows

Prepare your Windows machine:

  1. Install Zimbra Desktop.
  2. Copy the PST file to the Windows machine.
  3. Download the pst-to-tgz.ps1 script to the Windows machine.

To perform the conversion from PST to TGZ open a PowerShell window and execute the script as follows:

.\pst-to-tgz.ps1 "C:\Users\user\Downloads\rogers b.pst"

Change the example path to the path to your .pst file. Once completed the script will put a .tgz file next to the original .pst file. So in this example a file rogers b.pst.tgz will be created.

Please note that logs and eml export remain in C:\tmp\pst-to-tgz, please remove these after manual verification.

Performing the PST to TGZ conversion Linux

Not recommended

Prepare your Linux machine:

  1. Install readpst example: apt-get install pst-utils.
  2. Copy the PST file to the Linux machine.
  3. Download the pst-to-tgz.sh script to the Linux machine.

To perform the conversion from PST to TGZ open a terminal and execute the script as follows:

./pst-to-tgz.sh /path/to/export.pst

Change the example path to the path of your .pst file. Once completed the script will put a .tgz file next to the original .pst file. So in this example a file export.pst.tgz will be created.

Please note that logs and eml export remain in /tmp/pst-to-tgz, please remove these after manual verification.

Importing TGZ into Zimbra

This step should be done on a Zimbra Mailbox server and all commands are run as the OS user zimbra.

Prepare your Zimbra machine:

  1. Copy the TGZ file to the Zimbra machine.
  2. Download the import-tgz.sh script to the Zimbra machine.

To perform the import execute the script as follows:

./import-tgz.sh /path/to/import.tgz admin@example.com

So the first argument should be the location of the tgz file and the second argument is the target mailbox.

Notes:

  • Empty folders will not be restored.
  • EML files that have no email content, missing headers/body are not imported.

Run the following commands in a separate terminal to see errors in real time:

  • tail -f /opt/zimbra/log/mailbox.log
  • tail -f /tmp/zmmailbox-screen/output.log
  • tail -f /opt/zimbra/log/mailbox.log | grep -i error | grep "ParsedMessage"

This script was tested with 2GB PST files with thousands of emails and folders.

Extensive logging in /tmp/zmmailbox-screen/output.log will help you to track import failures. In case one item fails to import, the script will continue and import the next item.

How does it work?

The scripts of step one converts a PST file into a folder structure equal to the original mailbox structure. Inside the folder structure each email will be saved as an EML file. The entire structure is then compressed into a single big TGZ file.

Zimbra Insiders will be tempted to import this TGZ file directly into Zimbra, but often this will fail and since there is no Zimbra meta data in the TGZ file the date for all emails will be shown as the import date in the email list view.

The second step performed by the import-tgz.sh script uses existing API’s but with a different approach. The script works as follows:

  1. Extract the TGZ file provided.
  2. Create a new TGZ file for each EML file, retaining the folder structure inside each TGZ file.
  3. Change mailbox timeout values.
  4. Launch a detached screen session with the zmmailbox command in interactive mode.

Then for each TGZ file:

  1. Write full path to TGZ into log.
  2. Import TGZ by stuffing commands into the screen session.
  3. Gather output of zmmailbox and write it into log.

Finally:

  1. Restore default mailbox timeout values.

Benefits of this approach as compared to regular TGZ import

  1. Log is provided for each TGZ item.
  2. In case of failure the import continues.
  3. But mostly: TAR formatter exception is avoided.

Why is this better?

Zimbra REST API does allow to import a TGZ file with a complete mailbox, even if the TGZ is many GB’s. However, often the import will fail to complete, resulting in an error similar to:

2024-03-27 16:47:22,384 WARN  [qtp921760190-341:https://zimbra10.barrydegraaff.nl/home/admin@barrydegraaff.nl/?fmt=tgz&subfolder=import&timestamp=0&resolve=skip] [name=admin@barrydegraaff.nl;mid=2;oip=192.168.1.98;port=33166;] misc - ArchiveFormatter addError:Early EOF: path=zl_beck-s_000.pst/beck-s/Australia/9822.eml
com.zimbra.cs.service.formatter.FormatterServiceException: Early EOF
    at com.zimbra.cs.service.formatter.FormatterServiceException.UNKNOWN_ERROR(FormatterServiceException.java:118) ~[zimbrastore.jar:10.0.7_GA_4598]
    at com.zimbra.cs.service.formatter.ArchiveFormatter.addData(ArchiveFormatter.java:1742) ~[zimbrastore.jar:10.0.7_GA_4598]
    at com.zimbra.cs.service.formatter.ArchiveFormatter.saveCallback(ArchiveFormatter.java:976) ~[zimbrastore.jar:10.0.7_GA_4598]
    at com.zimbra.cs.service.formatter.Formatter.save(Formatter.java:162) ~[zimbrastore.jar:10.0.7_GA_4598]

This error does NOT indicate a broken TGZ or EML file. My suspicion is that Java has to do this import all in memory and some memory has been purged by garbage collection before the import was completed. The problem with this error is that it is almost impossible to debug as it will occur randomly. Making it very hard to recover.

With the new approach the TAR formatter exception should be avoided. And should it still happen the import will continue while providing meaningful logs that are easy to correlate to the TGZ/EML file. Making it easier to debug.

Limitations

Once the TAR formatter has done it’s work, Zimbra’s EML parser will need to parse the EML file. Some EML items stored in a PST file are not actually emails. These can be recognized by missing headers that are required in email. These are not understood by Zimbra and cannot be imported. These errors are not returned by zmmailbox and can only be seen via:

tail -f /opt/zimbra/log/mailbox.log | grep -i error | grep "ParsedMessage"

In addition some very old attachment formats and encoding types are not supported by Zimbra, and these can not be imported. These documents should be converted to an archive format (such as PDF) and stored as documents in Zimbra. Or if you are in a situation where a small percentage of your EML files fall in this category then perhaps it is an option to store an archival copy of the PST after importing into Zimbra.

Downloading the scripts

The scripts used in this blog can be found in the Zimlet Gallery: https://gallery.zetalliance.org/extend/items/view/pst-migration-tool

,

No comments yet.

Leave a Reply

Copyright © 2022 Zimbra, Inc. All rights reserved.

All information contained in this blog is intended for informational purposes only. Synacor, Inc. is not responsible or liable in any manner for the use or misuse of any technical content provided herein. No specific or implied warranty is provided in association with the information or application of the information provided herein, including, but not limited to, use, misuse or distribution of such information by any user. The user assumes any and all risk pertaining to the use or distribution in any form of any subject matter contained in this blog.

Legal Information | Privacy Policy | Do Not Sell My Personal Information | CCPA Disclosures