The most widely read magazine for Canadian lawyers
Issue link: https://digital.canadianlawyermag.com/i/160425
tech Support By Dera J. Nevin When e-discovery meets big data, can case analytics be far behind? Know this: although size matters, much depends on how you handle it. W e are living in an era of big data. What does this mean for litigation and for its requisite discovery effort? Lots — and the future is intriguing. The rise of big data forces a shift not only in thinking about the point of e-discovery, but also the approach taken to discovery and how it is managed. It may also bring about new skill sets in lawyers. "Big data" is a buzzword used to refer to a collection of structured and unstructured data sets so large and complex they are difficult to process using traditional database management software techniques. Some have also suggested big data is identifiable by the "three Vs": size (volume), the speed with which it is created (velocity), and the type of information the sets contain (variety). There doesn't appear to be a point at which data becomes "big data," but volumes of this size don't sit on a single desktop machine. Examples of big data might be terabytes (1,024 GB), petabytes (1,024 TB), or exabytes (1,024 PB), which might consist of billions to trillions of records of millions of people — usually from different sources, and possibly in a variety of formats. Marketing people, for example, get excited about this kind of data about customers because they can develop search techniques to link those disparate bits together and form a clearer picture of a group's (or person's) shopping patterns. Perhaps because of the sheer magnitude of its volume, big data can also refer to the technology, processes, and storage facilities required to handle it. Generally, big data sets can be distinguished because they are pushing existing technology infrastructure — particularly storage and processing — to its limits. The reason why big data is a big deal becomes apparent as data sets get larger and require ever-increasing system sophistication. Generally, most relationship database management systems and desktop applications are too inefficient or lack the capacity to manage big data, which can require massively parallel software running on multiple servers, numbering in the hundreds or thousands. Big data is an industrial volume of data, made possible within large organizations and also through increasing access to and use of the cloud. The cloud can be analogized simplistically to an off-site electronic storage facility. The technologies that have made daily use of parallelserver-based off-site computing possible have contributed to big data. What I've learned about big data is this: although size matters, much depends on how you handle it. Which brings us to what big data might mean to e-discovery. If we approach big-data sets with existing tools, techniques, and workflow, we in the e-discovery world are simply not ready for truly big data — if we want to retain credibility with our clients. For this reason, I think it might mean we're at the end of one way of doing things and at the beginning of new methods and ways of thinking about e-discovery. Let's start with tools: Many e-discovery systems and applications remain relational databases, and experience performance degradation (sometimes significantly) with extremely large data sets. So trying to bring a big data set into most traditional e-discovery tools will often be an exercise in frustration. Then, there will be challenges with traditional approaches to search. Keyword and boolean searches will be insufficient to understand a data set of this size. Suppose your keyword searches return 3,120,373 responsive word hits in a data set of www.CANADIAN L a w ye r m a g . c o m September 2013 21