Are you getting value from your information? According to a 2016 survey by the Economist Intelligence Unit, only 58% of North American businesses generate revenue from their data. That translates into nearly half of those surveyed who haven’t been able to turn their information into significant insights. While there are many reasons that these companies are lagging behind in data mining, perhaps the simplest answer lies in the sheer amount of data available.
In this day and age, the amount of data we produce is infinite—in fact, we are generating 2.5 quintillion bytes of data every single day. And, the International Data Corporation (IDC) predicts that by 2020 the amount of analyzable data will double. With all that information available, organizations have to figure out how their data can offer unique, valuable insights into their consumers, their products and their businesses as a whole, or risk falling behind the competition.
It’s not just data that’s valuable.
The numbers above don’t even begin to include the number of physical records that businesses still create, manage and store on a daily basis. Traditionally, the lifecycles of these business critical records have been managed by Records and Information Management professionals, from creation, organization and retention through destruction. But, with the rise of Big Data, records managers are now faced with new challenges, including how to unlock value from their physical documents.
How can RIM leaders help their organization incorporate this often untapped information into their data analysis?
Let technology be your friend. By investing in digital solutions for your records management strategy, as well as converting your physical documents to digital files, you can increase your efficiencies, streamline your documentation workflows and incorporate even more of your information into the world of Big Data analytics.
What could your information be telling you?
As technology has continued to improve, so has our ability to analyze the information we have available to us. Data mining allows us to ask and answer questions that just a few years ago would have been nearly impossible to address. Organizations that utilize data mining to their benefit can increase their productivity, grow their revenue and discover answers to questions they may not have known they had. So how have things changed?
About fifteen years ago, organizations started piling their information into what are known as data warehouses. These purpose-built repositories were designed to store specified sets of data that could then be analyzed to provide organizations with better decision data on their business, their customers and their markets. Since their creation, data warehouses have changed the way many companies use their information. In a well-optimized, well-populated data warehouse, very large sets of data can be efficiently analyzed to achieve key insights and answers.
However, data warehouses have their limitations. Before you can put any data into your warehouse, you must have a predefined schema that has been optimized to answer a particular set of questions. With this type of model, you can only collect and gain insights from a very narrow pool of data that address a narrow set of questions. Fortunately, with today’s expanding information streams, new approaches and solutions have emerged.
Designed to address some of the limiting issues so inherent in data warehouses, data lakes are intended to make a substantially broader set of your information available for analysis. Data lakes link multiple sources and repositories together, allowing you to put in unlimited volumes of unlimited data types. There are no predefined rules or questions underlying a data lake—information can be put in and then organized as questions arise. This, in essence, ensures that all of your useful information is accessible and that you can discover relationships and develop insights that may not have been expected or known.
Data Warehouses vs. Data Lakes: 4 Key Differences
While these structures are typically the realm of IT and data scientists, RIM leaders can benefit from understanding the essential differences and advantages of the technologies that are unlocking the value of Big Data.
1. Bigger is Better
Data warehouses are configured to ingest only certain types of data that are pertinent to the questions already established by an organization. Of course, depending on your schema, this data will continue to grow exponentially until you need to expand your facilities. In order to store and process the increasing amount of data, your warehouse must be scalable.
Compared to data warehouses, data lakes are huge. This can be both a blessing and a curse. The sheer size of a data lake allows organizations to bring in and potentially analyze vast volumes of data at a fast pace; however, that data will need to be organized and managed at some point in time.
Unlike data warehouses, the technology of data lakes allows you to input and address unstructured data of many different types. This not only lets you store more data, but it also increases the probability of discovering new information and relationships between the different types of data you analyze. Therefore, not only do data lakes provide storage for all types of information, but they also increase your chances of developing and discovering unanticipated insights.
Uncovering how value can be extracted from your data can be a time-consuming and arduous process though, especially if your data lake is left unorganized and unmanaged. RIM professionals can probably already recognize the need for a clear organizational structure that also addresses retention periods, regulatory compliance and information governance, as well as security controls to ensure the success of your data lake.
As we’ve previously mentioned, data warehouses are designed with a predefined schema. This schema determines what data can be put in and what data can be analyzed. With a data warehouse, you must know the types of questions you want to ask from the start and, if new questions arise, understand that they may not fit into the model in place. This inherent limitation restricts the questions you can ask as well as the answers you can achieve from your data.
Information stored in a data lake can come from multiple sources, in multiple forms and in a variety of classes. You do not need to know the questions you want to ask when inputting—that will come in time. Data lakes give you the flexibility to create new questions as you go.
Data lakes also allow researchers to group a piece of data into multiple sets or classes that may or may not connect. One piece of data can then contribute to multiple questions and analyses, increasing its value and providing your organization with great insights.
Big Data is made by the people, but it’s also for the people. The insights that come from data analysis can play a vital role in transforming an organization. However, it’s not always readily available to all.
Data warehouses are often isolated and off-limits to most members of an organization, with very few team members actually able to access the wealth of knowledge inside. Only a core set of people can enter, organize and analyze the data.
Data lakes, on the other hand, allow for sharing of data among multiple users. Rather than a core group of researchers studying the same data set, many researchers can create numerous indexes that cross paths with the same data sets. Thus, one piece of data can be grouped into multiple indexes and metadata schemes, allowing it to be used for different purposes. This allows you to run more powerful analytics and derive even more powerful insights.
When considering the cost of building a data mining facility, as well as acquiring the hardware and software that are vital to operations, organizations must also evaluate the potential value. Not only are data warehouses expensive to build, but they also require a significant time investment before any value can be garnered. Before data can be inputted, an effective data schema must be developed, and even then, only a limited amount of value can be achieved by that unique data set.
Data lakes operate under a much more efficient storage algorithm. While they are certainly expensive to build, the amount of data you can store in a single unit of space is much greater in a data lake. Less restricted data sets and the vast amount of storage available allow researchers to analyze multiple sets of data at once, gleaning more insights from the information and therefore leading to an increase in value over time.
It’s time to put your data to work.
Overall, when compared to data warehouses, data lakes provide organizations with the flexibility and the resources necessary to explore all usable information and then turn that information into valuable results. While the amounts and types of information are exploding, the technology needed to address and uncover value from this information is also advancing rapidly.
However, that doesn’t mean data lakes don’t present their own unique challenges. When not managed properly, they can become cluttered and dysfunctional. Not only are RIM experts critical in the organizational development stage, but their in-depth knowledge of retention schedules, government regulations and compliance policies is invaluable when it comes to the day-to-day management of a data lake and its contents.
Don’t miss Part Two of our blog, where we’ll discuss how you can implement RIM best practices to prevent your data lake from turning into a data swamp.
Ready to learn more now? Check out our webinar The Information Economy: Driving Value from Your Information Assets!