Walt Disney, one of the world’s largest and most highly valued companies according to the prestigious Forbes magazine, has developed a Big Data system for the collection and analysis of visitor activity data at its theme parks, which has so far been implemented at its Orlando park in Florida. Every year 100 million people visit its theme parks using its attractions, buying Disney products, eating in its restaurants and sleeping in its hotels. Thus, the activity of these visitors can result in a large amount of data if properly recorded and stored. Analysis of the immense volume of data generated on a daily basis can provide Disney with solid knowledge for strategic decision making, thus helping to improve its theme parks by making them more productive, accessible and therefore profitable, offering visitors what they need at any given moment.
The Big Data solution developed consists of a data collection system and a Big Data architecture based on Hadoop and NoSQL databases. Firstly, data collection is carried out through the “Magic Band” wristband offered to users as part of the “My Magic+” system for improving the user experience. This wristband serves as a hotel room key and park entrance (although it is possible to obtain it without staying in a park hotel), and can even be linked to a credit card to enable payment in shops and restaurants. In addition, used in conjunction with the “My Magic” system applications, it provides important advantages such as queue-free access to attractions, booking and modification of rides on attractions, personalisation of the visit to the park in the interaction with Disney characters, etc. As for the information collected, thanks to this system, information is obtained such as:
- Real-time localisation
- Purchase history
- Visitors’ personal information
- Patterns of use of attractions
In terms of privacy, Disney allows visitors to determine what information they share and with whom. However, Disney says that even at the most restricted level, the system is able to collect useful information without violating privacy.
The volume of data collected by this system can reach up to 5 Tb per day, which in terms of volume alone is a clear example of Big Data. This, together with the extremely high speed of data generation, the possibilities of real-time analysis and the diversity of data sources collected, makes the need for a high-performance system with Big Data characteristics evident.
However, despite Disney’s size and economic resources, a completely Open Source solution was initially chosen and the task of its development was entrusted to a small team of 6 professionals. The reason for this choice, according to the project manager, was mainly flexibility. The use of Open Source solutions often has its weaknesses in reliability and fault tolerance, lack of documentation and technical support and, in general, scalability problems. However, they facilitate the development of prototypes as a proof of concept and, in addition, they are usually extensible, allowing the developer to create and test new functionalities on top of them and in multiple programming languages. Once the prototype has been developed, we can choose to acquire one of the many paid solutions that in many cases use Open Source applications as a base, to which new functionalities, documentation, support, fault tolerance, etc. are added. It is precisely this approach that has been applied at Disney.
The Big Data solution developed by the Walt Disney team uses the Hadoop/MapReduce architecture, the NoSQL Cassandra DB based on columns (and not rows like relational DBs) on Hadoop, the Mongo DB documentary database and a set of tools that complement the above for particular tasks. The operations control team uses the platform to view, analyse and index error messages while another division of the company uses it as the basis of a recommender system. Application developers require high throughput and low latency data access while the analytics generation team’s data access latency requirements are relaxed. In summary, some of the concrete uses of the collected data are:
- Audience analysis and segmentation
- Recommendation system
- Analysis of visitor flows or movement in the park
On the other hand, with regard to the aforementioned suitability of using Open Source solutions, once the prototype of its data management platform had been developed, Disney relied on Cloudera to provide a Hadoop cluster with good support and additional features and, on the other hand, adopted the Cassandra version of DataStax. In this way, the migration to proprietary solutions is being done gradually as needs arise.
In addition, in order to isolate the different system clients from the NoSQL technology used and to protect the system from undue modification, different types of interfaces have been developed to allow access to the system at different levels.
Obviously, the cost of implementing the data management platform together with the MyMagic+ system is enormous, estimated at around 800 million dollars. However, the good results are leading Disney to consider expanding the system to other Disney theme parks.