Deduplication options for messages and items in esi analyst

Understanding Deduplication Options

ESI Analyst offers multiple options for removal of duplicates (deduplication) from data that will aid with paring down data sets to a manageable population. However, there are instances where you may not want to deduplicate at a global level. Here is a breakdown of the options available:
  1. Do not deduplicate 
  2. Globally deduplicate (provided hash)
  3. Globally deduplicate (system hash)
  4. Deduplicate based on Import
  5. Deduplicate based on Evidence Source

Do Not Deduplicate


If this option is selected, no deduplication will occur and every item will be loaded into the platform regardless of previous deduplication settings applied to previous imports.

Globally Deduplicate (Provided Hash)


All items will be hashed and deduplicated against every item within the designated Project as well as the items currently being ingested in the order in which they are read based upon the hash value provided (MD5, SHA-1 or other).  

If this option is selected, you must provide a hash value column in your load file.
This hash value will be leveraged to perform the deduplication process.


If this hash value is not provided, the item will be skipped (rejected) during import. 

Globally Deduplicate (System Hash)


All items will be hashed and deduplicated against every item within the designated Project as well as the items currently being ingested in the order in which they are read based upon the system's generated MD5 hash value. Each metadata type incorporates different fields to create the system hash value. 
Global deduplication of decentralized communications such as chat, sms and mms may not result in 100% deduplication of all items. This is due to the nuances of content storage settings of mobile and app-based data. Deduplication may have unexpected effect on messages and threads. Deduplication looks for messages with the exact same content sent at the exact same time with the same senders and recipients. If any metadata is differing across messages due to local device settings, this may cause items not to be deduplicated. 

Deduplicate Based on Import


All items will be hashed and deduplicated against every item within only the items currently being ingested in the order in which they are read based upon the system's generated MD5 hash value. This is essentially a selection that allows you to isolate just the records being imported to deduplication.
If this option is selected items will not be deduplicated against any previously imported items. 

Deduplicate Based on Evidence Container


All items will be hashed and deduplicated against every item within the designated Project that originated from the designated evidence container (drop down selection in step 1 of your import) as well as the items currently being ingested in the order in which they are read.

This deduplication option is helpful with you wish to deduplicate items based upon a specific device or set of devices defined by a single source of evidence (such as a custodian). it is sometimes referred to as "custodial level deduplication".




    • Related Articles

    • Understanding the Geolocation System Message

      For steps on how to view the Geolocation System Message, read "How to Understand the Geolocation in Item View" The system message for geolocation items is generated by ESI Analyst to give key information on that item. Address returned via lookup – ...
    • Understanding Control Numbering Options in Imports

      When importing data across all categories, you have the option to assign a control number through your load file or in the interface.  You have three options to attribute Control Numbers to your data. 1. Control Number Included in Your Load File - ...
    • Understanding the Option of Isolating Threads in an Import

      Isolation of chat threads allows for comparison of chat threads present on different devices. This keeps the chat thread from being combined with other evidence and shows it as it appeared on that single device. Otherwise chats, sms, mms and ...
    • Understanding the Options for a New Record or an Overlay

      The first option when importing data is to determine if you are loading new data or if it is an overlay or update to data you have already loaded.  New Records: If this load is new data not yet in the system, this is the option to add those new ...
    • Understanding the Available Fields for Communications

      When importing communications data like text messages, chat messages, emails etc. you will have a list of fields that are available for your import. Generally, it is easiest to match closely the titles in ESI Analyst to your load file. Your load file ...