Revolutionise your marketing strategy by building a Data Lake

David Lastra

16 November 2017

171116 gr rebelthinking tv datalake wp 1

“It is tempting, if the only tool you have is a hammer, to treat everything as if it were a nail.” Abraham Maslow


Data Lake, Data Driven


A Data Driven organization relies on technology to improve results, through analysis and the correct management of data,with the aim of discerning what changes of significance can be made and applied in a flexible way within the company.

There are currently thousands of applications and services available, useful in the optimization of the different stages of the customer’s life cycle. But in most organizations, the information that is attained through these applications is eventually stored in  data silos, within those tools, where customer contact first occurred. Generating multiple landing pages, advertising technologies such as DMPs, CRMs, Mobile Apps, automation marketing tools, web analytics services, operational databases…etc, which capture the information and leave it fully or partially available for later analysis. With the amount of data available, we are challenged with providing our consumers a truly unique, and improved experience.

For effective data-based marketing to work, the organization in question needs to own the data they’re working with, gather that data from a each different application and channel, store, treat, integrate and govern it. Traditionally, organizations have built data stores (data warehouses, datamarts, relational databases), which needed to be treated by internal IT teams according to patterns previously defined.

With the evolution of Big Data technologies in recent years, driven by the enormous impact the digital world has had on businesses, more flexible and agile technologies are required, so that businesses are able to respond more effectively to this rapidly changing environment, and produce data that is completely heterogeneous, depending on the source or channel being analyzed.

To meet this need, we require systems with a large storage capacity, to collect information from different channels, services and internal systems (social channels, marketing automation, web analytics, CRM, ERP, fingerprint, paid media measurement, call centers…etc.) with structured and unstructured data types, available in real time and/or in batches, consultable in reading mode…etc. Data Lakes allow us to store and index any type of data without the need for transformation, opening up new possibilities for organizations, especially those in marketing and business, who are then able to produce an analysis in a way that is flexible and with total technical independence.

The difference between a Data Lake and a traditional Data Warehouse

captura de pantalla 2017 11 16 16 35 09

Data Lakes are usually implemented within a scalable service like the Cloud, that cost less than traditional Data Warehouse systems, both in terms of storage capacity and processing (AWS, Azure, Bluemix, etc.), allowing organizations to capture more data from different sources without needing to know how this data will be used. Using a Data Lake, we can enrich seemingly valueless data by combining it with other data sets, in order to take advantage of it at another point in time when that data is more useful.

An important concept is that, technologically, you don’t buy a Data Lake, a specific tool, but we are talking about building one, with different services and technologies, both internal and external, and in this process different departments have to collaborate to identify the available channels and systems from which information can be obtained. A Data Lake is not a passive repository where data is stored, but a dynamic platform that will evolve over time, providing new data processing capabilities and capabilities to gain more knowledge of the data.

If the data is varied and high in volume, it can be hosted and indexed locally on large platforms of distributed data such as Spark or Hadoop. If the data is semi-structured or recorded in log files, it is best to use NOSQL data systems such as Cassandra, CouchDB and MongoDB, and it helps to use search engines that allow quick searches such as Elasticsearch or Splunk. If there are large amounts of unstructured data, such as content, to be handled, we ought to consider other search engines that allow for NLP (natural language processing) semantic analysis, such as the Open Source Apache project OPEN NL, or market solutions like Attivio, Expert System or Sinequa. If you need to combine text and image analysis, IBM Watson (Alchemy API) is ideal. Data Lakes can combine any of these solutions depending on the evolution and type of treatment needed for the data, and it is this flexibility that makes it so powerful.

Data Lakes do not remove the need for quality data or a governance model, so data cleansing and information mining is still necessary. To prevent a lake of data from becoming a huge swamp, it is best to establish a data tracking system, to quickly identify the origin and date of data capture. To maintain the quality of the data, excluding poorly formed or overlapping data, these solutions are often combined with Big Data tools. Through the use of ETL standardization and structure, data can be quickly analyzed in relational data systems using visualization tools or dashboards, in real time.

A Data Lake can be used in many different cases; in data-oriented marketing it allows you to standardize and unify very important terms like “campaign codes”, “channel” and business definitions such as “customer”, “lead”, “sales area” and “product” from the very start, therefore improving the method of measuring each of the marketing operations that are being carried out.

The main advantages of using Data Lakes are:

  • Elimination of fragmentation. It offers a unified view of the entire customer experience, as it provides an integrated view of all platforms, silos and channels.
  • Improvements in customer knowledge and decision making processes throughout the customer journey.
  • Ability to generate campaigns and capture processes with greater agility and autonomy, through tools that measure and optimize investments in marketing in real time.
  • Your own storage system of historic data, which lets you analyze and establish predictive models of customer behavior.
  • Trace and uniquely identify the user from the data acquired at different touch points, in order to accurately attribute the origin of conversion.
  • Generate an owned technological environment with service-oriented architecture that allows for the development or evolution of new tools.
  • The ability to improve business intelligence through a dynamic dashboard of multiple data sources (own and external) that will allow decisions to be made in real time.
  • Allows the organization to create an agile Data Driven marketing operations team that can quickly respond to any business needs.


Aliseda, exploitation of customer knowledge


Aliseda Inmobiliaria is a company expert in the field of management of real estate assets, whose main areas of focus is the property business, property development, land management and the sale of the finished product; brand new and second hand homes.

Its main goal is to offer the best customer service, based on principles of transparency, flexibility and agility and, therefore, it promotes the continuous development and optimization of the tools necessary to satisfy the expectations of anyone interested in one of its properties.

In response to this challenge, Good Rebels has helped them carry out a very ambitious technological project that:

  • Is based on real time data consumption technology & Big Data
  • Expands knowledge of leads and customers through their behavior
  • Identifies the best opportunities in the process of commercial flow through scorings
  • Provided their call center with personalized consumer information
  • Optimizes investment within the field of marketing
  • Improves business intelligence
  • Increases the agility and efficiency of campaigns with personalized and real-time targeting criteria

In order to do this, it has implemented a digital tracking system, Big Data technology to record interactions produced and information consumed in real time (Cassandra y Storm), a marketing automation platform for lead nurturing (Watson Campaign), analytical and predictive tools for the exploitation of information (Pentaho), and a system of commercial certification through a mobile app.

All the information that is generated from potential contacts, through their entry into various existing channels, to the management of the commercial relationship established between the company and the person interested in a property throughout the purchasing process, is recorded in a Data Lake. The centralization of data enables the development of recommendation systems to improve user experience, analysis of the conversion funnel to measure the profitability of actions in an agile way, organizations to perform communication actions oriented towards the needs of each client in time real, and create a technological environment that allows for the development and evolution of new tools.