The most widely read magazine for Canadian lawyers
Issue link: https://digital.canadianlawyermag.com/i/468835
18 M A r C h 2 0 1 5 w w w . C A N A D I A N L a w y e r m a g . c o m The use of hash values is also consid- ered to be secure in the sense that if you have a hash value, you cannot recreate the content of the hash file, even if you know the algorithm. This is helpful if you do not want anyone to determine the contents of the source data from the hash. There are many commonly accepted hashing algorithms, but in e-discovery two of the most frequently used hash algorithms are the MD5 and the SHA-1. Every hash algorithm uses a specific number of bytes to store the calculated hash value of the contents that were hashed, and so each algorithm will return the same number of characters, regardless of the contents fed into the algorithm. For example, the MD5 algorithm will always return a hash value having 32 characters, whereas the SHA-1 will always return a value having 40 charac- ters, in both cases regardless of whether the hash value was run against a single word or the hard drive of a computer. Although SHA-1 takes longer to calcu- late, there is less likelihood of a duplicate hash value being generated (called a "col- lision"). In real life hash collisions do not occur, but some clever researchers have been able to trick an MD5 hashing algo- rithm into showing the same hash for two totally different files. We'll revisit collisions in a few paragraphs. In computer forensics, hash values are commonly used to confirm that a forensic image was done correctly (because the copy will have the same hash as the original), but it can also be used to specifically iden- tify files (i.e. in a theft of information case). Hash values are also used in virus detection, to confirm that particular file (or applica- tion) was created by a legitimate publisher and in policing, in the identification of contraband (illegal images). In e-discovery, the most common use made of hash values is to identify duplicate files, usually to cull them out of the col- lection (deduplication). Another common use of hash values is de-NISTing, in which hashes are used to identify and cull files that are on the NSRL/NIST list. (This is a list, published quarterly by the U.S. National Institute of Standards and Technology and contains millions of known common files, including system and applications files.) More and more frequently, hash val- ues are also being used as evidence con- trol values. An evidence control value is one that attaches to a particular record in order to identify it with particularity, usu- ally when incorporating a document into the evidentiary record of a court proceed- ing. Common examples are: page numbers t E C h s u p p o rt dnevin@proskauer.com o p I N I o N ash values are often associated with computer forensics, in the con- text of proving or disproving evidence. Hash values are also used in e-discovery to filter or organize files. However, they can also be used for evidence record-keeping and control, even when computer foren- sics are not used. Although I define hashing briefly, hashing both as an evidentiary technique and as an evidence control function is complex and readers should explore this topic further if interested. Hash values (or hashes) are the result of a calculation made by a hash algorithm. A hash value results when a hash algorithm (a proce- dure or formula for solving a problem) is run against a target, such as text, an electronic file, or a computer or hard drive. The hash values that result from these calculations are said to be the "digital fingerprint" of the target that was hashed. For this reason, generally hashing is understood to be a fast and reliable way to identify and compare the contents of individual files or media. What is a hash value and why does it matter for e-discovery? By Dera J. Nevin H