Status Update

Oct 8, 2013

Slides

Good day, ladies and gentlemen, and welcome to the Oracle Big Data Primer Webinar Call. At this time, all participants are in a listen only mode. Later, we will conduct a question and answer session and instructions will follow at that time. As a reminder, this conference call is being recorded. I would now like to introduce your host for today's conference, Ms. Shawna O'Boyle. Ms. Shawna, you may begin your conference. Thanks. Hello, everyone, and thank you for joining us today as part of our ongoing educational speaker series hosted by Oracle. I am Sean O'Boyle, Senior Manager of Investor Relations and today is Tuesday, October 8, 2013. Joining us today is Oracle Executive Senior Vice President, Andy Mendelson and Equity Research Analyst, Brendan Barnacle of Pacific Crest. Today, Andy will be discussing big data. However, he will not be discussing any data that is not already publicly available. At the conclusion of Andy's presentation, we will turn the webcast over to Brendan, who will moderate the question and answer portion of the call. However, you may submit questions at any time during the presentation by typing your question in the Q and A box in the lower part of your screen. Please keep in mind that we will not comment on business during in the current quarter. As a reminder, the matters we will be discussing today may include forward looking statements and as such are subject to the risks and uncertainties that we will discuss in detail in our documents filed with the SEC, specifically the most recent reports on Form 10 ks and 10 Q, which identify important risk factors that may cause actual results to differ from those contained in forward looking statements. You're cautioned not to place undue reliance on these forward looking statements, which reflect our opinions only as of the date of this presentation. Please keep in mind that we're not obligating ourselves to revise, update or publicly release the results of any revisions of these forward looking statements in light of new information or current events. Lastly, an authorized recording of this conference call is not permitted. I would now like to introduce Andy Mendelson. Thanks, Shauna. Good morning, everybody. So I'm going to give a very short, about 15 minute chat about big data and then we'll take questions. I thought I'd first start talking about what is big data. There have been all kinds of people talking about what big data is. There's a famous 3 Versus or 4 Versus of big data. And all the various new start up companies that have anything to do with information are claiming their big data. So what is the real kernel of what's going on here? Well, the real kernel is that what we're talking about here is it's all about analytics. People have been doing analytics for 30, 40 years with information systems. And what we're talking about here is moving to the next generation of those analytics, which we're calling big data analytics. And there are 2 key transformations going on in the industry that are driving the big data trend. And number 1 is really about different kinds of data. So traditionally, analytics systems have been looking at data from companies' operational systems. So for example, if you're a big retailer, you're looking at data from your retail sales, the sales of information from products at your retail stores. If you're telco, you're looking at your call data records that record all the phone calls and how long they've been there and what the charges are and things of that sort. So that's analytics on your operational data. And what people are talking about with big data is looking at more kinds of data than traditionally they've been looking at. And some of these data sources are from inside the company, things like looking at documents, more unstructured data, voice, video. And as we move to the Internet of Things, people are looking at sensor data sources as well. And on the Internet, there's also a huge amount of data, of course, on the Internet and companies are looking at, well, is there some way I can extract useful information out of social media? Can I find information about my customers again? And what they're saying on social media about my company. Are they saying something good? Are they saying something bad? There's a lot of bloggers out there saying all kinds of things that are potentially of interest to companies. So people are very excited about the possibility of looking at this data, getting more information, especially about their customers and using that to deliver raise the potential revenue of their companies by better marketing their customers that are upselling them, etcetera. So I think that's the first big thing, moving from operational data sources to these broader internal Internet sources. The next big thing going on is new kinds of analytics are being done. So, traditionally people did what we call OLAP analytics, you know, slice and dice of data, mostly historical data from the operational systems. And we've been moving over the last few years to more predictive analytics, data mining techniques, for example, to look at what's going to happen in the future. And as we move to these broader kinds of data sources, people are looking at doing analytics against those kind of data sources, text, spatial. Graph analytics has become very popular as you look at social networks. People want to know or inquiries about who is a friend of who and based on that, can they do some kind of interesting marketing or targeted advertising, etcetera. We've also been moving to new kinds of analytic tools. R has become very popular as a development tool for doing analytics processing. And of course, we've been moving in memory in a lot of phases to do memory analytics and memory databases, etcetera. So I think those are the 2 big key drivers of what's going on in big data. And so what are we doing at Oracle to deal with this new world? Well, Oracle, of course, has for many years been the market leader in analytics. And we have the world's leading data warehousing technology, BI technology with our database and our BI products as well. And so what we're doing as we move forward to the big data space is we, of course, are trying to build a platform and a set of solutions that deal with big data. We need to be able to acquire, organize, discover and analyze big data. And as we move forward, in addition to the Oracle Maslow Lake Barrel Relational Database, we're doing analytics against big data. We're also now moving to utilize Hadoop as a platform in our big data environments. We've also come out with Oracle's NoSQL database that has a value in the ingestion phase of big data. And of course, we're continuing to evolve our big data tools as well. One of the big things we're doing at Oracle that I think is very important in the big data space is we are in the business of delivering what we call engineered systems. These are combinations of hardware and software that are integrated together to deliver a great off the shelf experience for customers where they can order our products or our hardware and software and get great time to value where they can very quickly build systems. And we've done that in the big data space. We have our big data appliance for running the Hadoop processing on our Oracle NoSQL database as well. We have Oracle Exadata, which is a great platform for doing massively parallel analytics. And then, of course, we've finally added Exalytics, which is our platform for our BI tools and our Endeca data discovery tool. And finally, on the top, Oracle is, of course, in the applications business. And so we are also in the business of delivering horizontal and vertical applications and solutions to our customers and we will continue doing that on top of this stack of big data technologies as well. So that's our key strategy. So now let's move on sort of a little closer to the products that we actually are doing. So in this picture, we're showing, of course, on the left side, the data sources that we talked about, all kinds of data sources, both operational and new kinds of data sources like documents and blogs and information off the Internet, etcetera. And what we're showing is a little different now is in the past what you would do is you take this information and you might put it into various files and do staging operations, do ETL operations to transform the data before moving into a data warehouse. Now what we're showing in this next generation platform is using Hadoop and Hadoop's HDFS distributed file system as a way of ingesting large amounts of this data. And then we will use the Hadoop MapReduce platform to do some batch processing and ETL transformations against that data, sift through the data, look for interesting tidbits before we then use our big data connectors and load that into the Oracle Exadata data warehouse. We also show our Oracle's NoSQL database here as well. NoSQL databases are also very good in ingesting information rapidly and also again doing some operations against it and then that data can be also again moved into a data warehouse necessary. On top of this basic platform of Hadoop and the Oracle database, we have our analytics engines, we have in the database, our advanced analytics capability, which includes predictive analytics and our processing. We are also making this kind of analytics especially R available on the Hadoop platform as well. And then finally on top we show our whole business analytics tools like our BIE tool, our Endecca data discovery tool. Those tools also can be run against data both in the Hadoop HDFS file system and in the Oracle database to deliver visualizations and analytics against the data. Okay, let's go to the next slide. And as I mentioned, a big part of our strategy is our engineered system. So, on this next slide, I just sort of show how those engineered systems fit into the big data solution we just talked about. Of course, the Big Data appliance is our engineered system for running both Hadoop and for running the Oracle NoSQL database. We are the 1st vendor by the way that's producing an engineered system for optimized for running Oracle NoSQL or any other NoSQL database for that matter. Exadata, of course, is our platform of choice for running big, massively parallel data warehousing for you doing your interactive analytics. And Oracle Exalytics is our engineered system for running our BI analytics foundation and that includes BIE and the Endeca data discovery product and also our Essbase engine as well. Okay, let's let's move on. What I want to do at this point is drill down a little bit on sort of or talk about how Hadoop and Oracle databases relate to each other because I think there's a lot of confusion out there about what Hadoop is good for and what Parallel Relational Databases are good for. And the key thing to understand and what we are doing in this platform is that in order to have a big data solution, you need both. If you talk to even the people who are the biggest early advocates of Hadoop, who are now trying to do analytics, what they decide is Hadoop is a great platform for ingesting large amounts of data at very low cost per terabyte and it's a great platform for doing some analytics on that data, but it's more a batch processing analytics. So what does that mean? Well, it means if you have a data scientist who's sitting in front of your terminal and he's asking questions for the trying to understand the business and trying to come up with great ideas for raising revenue, better marketing, advertising, etcetera. He wants interactive response. He wants to send in a query and get a response back in a few seconds. That's not really what Hadoop was designed for. Hadoop is designed to crank out scalable batch processing execution and you'll get maybe tens of minutes or an hour response to those kind of queries. What they want is snappy response and that's what you need a mass referral relational database to do and that's what these guys are doing. They'll use a dupe for ingestion and for doing some big batch processing analytics against big data, but then they'll move a subset of the data that they want to do further analytics on into their massively parallel relational database, in this case, Exadata and that's where they'll do their interactive analytics against the data using, of course, rich SQL language that we provide with Oracle. The other thing to note is that although there are some SQL tools available in the Hadoop environment, they're very primitive and raw, Hive, Pig, etcetera. And then, of course, you can code in Java as well. But on the massively parallel relational database side of the world, what people do is they code in SQL for the most part. SQL is a very expressive and productive language. A couple of lines of SQL is equal to hundreds of lines of code in Java. It's also much more efficient at processing as well. So it's much faster and requires much fewer computing resources to get a given job done. So people also like the fact that the relational database are just much more productive environments than Hadoop is today. We also mentioned R. R is of course something that we made available as a statistical programming language and predictive analytics language in both the Exadata platform and we're now also making available on the Hadoop platform as well. Let's go to the next slide. And here I just want to give this is like a slide taken out of my keynote from OpenWorld. So you're all welcome to go to oracle.com and take a look at my keynote there I did in Oracle Database 12c. But what this did was it's sort of the end result of we went through where we showed, let's say you want to do a very common example that people talk about in big data, which is looking for fraud in banking system of some sort. And we wrote the application 2 ways. We wrote it using Java on Hadoop using MapReduce. We also wrote it using SQL extension, we call SQL pattern matching, which is a new part of the SQL language that we've implemented in the 12c version of our database. And we just sort of measured 2 key metrics here. One is, how many lines of code that takes to solve this problem. And what you see here is over 6 50 lines of code using Java MapReduce versus I think it's on the order of 15 lines of code using SQL. So number 1, SQL is much, much more productive than having to code at a much more primitive level using, in this case, Java MapReduce. And we also want to show we can run SQL in a very high performance massively parallel fashion. And in this example, the run time also of the SQL version of the analytics was much, much less than 10 seconds while the run time on Hadoop was over 70 seconds. And we actually ran the SQL on I think a couple of processors and Hadoop was on like an 18 node cluster. So, the key message here is relational databases are constantly moving the bar and getting faster and faster. And I think people who are thinking that, oh, we're just going to put a little SQL engine on Hadoop and catch up in a couple of years to what relational databases are doing and have built over the last 20 years for doing high performance, massively parallel SQL querying, I think are being a little optimistic about how soon they're going to get the parity there. Okay. So let's go to the next slide. And here I just want to mention one thing that we're doing in database 12c around in memory processing. So for analytics, the relational database engines have been producing very high performance, massively parallel relational database engines for many years that are very good at cranking through, crunching through terabytes and petabytes of information. But there's been a sort of a breakthrough in the last few years in looking at columnstore technologies and in particular now in memory columnstore technologies for making analytics even faster. And in database 12c, we just announced at OpenWorld a few weeks ago that we are adding this in memory columnstore technology to database 12c. This again is going to give us another big leap forward in analytic processing in the relational database, in this case, Oracle relational database. This, of course, is going to be very exciting for customers doing analytics against big data and data warehousing. So again, we're sort of raising the bar of what relational databases can do here. We're not standing still. Relational databases are moving forward very aggressively into the analytics space further. And again, people who are building SQL engines from scratch, again have to think about adding even more technologies than they're thinking of to sort of match the capabilities of these relational databases. Okay. So let's go to the next slide. Just so what are the some of the key differentiators for what Oracle is doing in the big data space versus other competitors. I think the big thing we are doing here is we are giving customers an integrated platform and an engineered platform. So a customer can plans, Oracle's Exadata platform. Those 2 platforms or engineered systems can be very easily integrated together. We have both hardware integration via by using the InfiniBand networking technology across both platforms for making it very efficient to move information back and forth. And then we have software integration that we call our connectors that tie together the Hadoop platform with the Oracle Database platform. For example, one of the connectors, our SQL connector, lets Oracle SQL reach out into Hadoop HDFS and run SQL queries against the HDFS data. Another example connector is our Loader that lets you very efficiently move data from HDFS into the Oracle Exadata database. And then we have, of course, our whole array of technologies in Exadata that make it a great platform for doing big data analytics. Next, a big differentiation that I just mentioned, we are adding very high performance in memory communal processing into our already very powerful relational database engine in Oracle. That's going to make us even much more outstanding interactive platform for doing analytics against big data. Another big part of what we're doing here is Oracle has a huge ecosystem around it of developers and ISVs and SIs who know and love the Oracle platform. They have huge sets of skilled consultants who know how to manage the platform. They know how to develop against it. We are sort of building on top of that as we go into the big data space. And then finally, we are giving a complete solution to our customers. If you buy our big data appliance, you buy Exadata and the connectors between those 2. If you have any problem, of course, you just call up Oracle. We support you top to bottom from hardware to software with any issues you have. And of course, we can provide you all the consulting help you need as well on top of that. Okay. And then let's just close. Of course, we have huge numbers of customers using our big data platform. Just I'll just mention a couple here. UPMC is University of Pittsburgh Medical Center, which is one of the leading medical research centers in the fields of genetics and other health sciences. They are a huge Oracle customer for big data, big user of Oracle Exadata. For example, let's go through here, SoftBank, BigTelco, huge data warehousing user SoftBank is BigTelco in Japan. They've actually also recently just been moving into the U. S. They actually moved all their data warehousing technology off Teradata onto Exadata several years ago. They're a very successful customer. Thomson Reuters is one of our big, big data customers. They are using the big data appliance in Exadata in their big data processing. StubHub is part of eBay. They do ticket reselling. They are using Oracle's R technology that I mentioned earlier for doing statistical analysis and predictive analytics. And with that, I think we'll move on to the Q and A section. Thank you, Andy. Before I turn the call over to Brendan for the question and answer portion of the call, please let me remind our listeners that you can submit questions at any time during the presentation by typing your question in the Q and A box at the lower part of your screen. Brendan? Thanks so much, Shauna, and thanks, Andy. Andy, I wanted to follow-up on some of those customer references you just gave. I was wondering where you're seeing your customers most frequently use your big data solutions right now if there is a kind of most frequent use case. Yes. I think the most popular use cases these days are in financial services and telco. The big banks are definitely very interested in a lot of what we are talking about. They are very interested, of course, in looking at data from social networks to see if they can get information about their customers. They're also interested in using that data to see if they can use it to help Telcos are another big vertical that we see a lot of interest. A big problem in telco, of course, is their customers leaving one telco to go to another. They call this churn. Churn analytics is a big part of what they're doing. They're actually looking at using graph analytics for doing that kind of processing. Great. And listening to your presentation, clearly, Oracle is leveraging both your software and your hardware business. Do you think the big data software advances you've made can accelerate the hardware side of the business? Yes. So, I mean, the biggest part of our business that's here right now on the hardware side, of course, is Exadata. Exadata originally was being used almost 100% for doing big data kind of analytics problems, big data warehousing problems. And it's still about 50% of all the Exadata are being used in this space. We also, of course, are pretty successful with Exalytics. It's a great platform for our BI tools. And our big data appliance is a more recent addition. And we're also doing reasonably well selling hardware into that space as well. And one of the big things here that I think is worth emphasizing, once customers use these engineered systems, they're almost always buying more. They're always it's always kicking the tires the first time around, but we see customers who are very happy with these products and become very large repeat customers. Great. As you mentioned in your presentation, there are a lot of vendors in the big data space and they talk about different approaches to working with data than what we've seen in the past. Do you think these new approaches are going to ultimately be replacements to existing technologies or just supplements? Yes. I think what I'll talk about is 2 of the main things people are talking about in this space. There's Hadoop and there's NoSQL databases. So why don't I start with Hadoop first since I think that is the real core of what we're talking about here in big data. And the key thing to understand is what we are doing today is sort of an extension of what people have been doing for many years in the past. It's just the next generation of analytics. And so in the past, before Hadoop existed, what did people do? They would use file systems as staging areas for ingesting large amounts of data that they'd eventually process and bring into their data warehouse. So Hadoop is really replacing that. HDFS is a much more scalable file system and that's sort of replacing the old traditional file systems people might have been using in their analytics initiatives. And Hadoop has this added benefit that not only is a good low cost file system, good place for ingesting information, it also has a MapReduce batch processing engine for doing some analytics there. The place where people start getting confused is they think, oh, because there's some simple SQL analytics on HDFS that suddenly they don't need massively parallel relational databases anymore. And that's where they get confused. I mean, what I was trying to explain earlier is that you need the massively parallel relational databases to give interactive response times for your data scientists to do their big data analytics. Hadoop is great, but it's really more sort of replacing the use of file systems and ETL engines sort of in middle tier platforms. And it's not really a replacement for MPP relational databases. And because these relational databases are raising the bar on what's normal, what's expected of them, what's table stakes, like for example, this new in memory calm store technology that we've been adding, It's not clear to me that it's going to there's any way that adding SQL on top of Hadoop is going to catch up to them in our anytime over the next decade or so. So, I think that concern is very overblown. I think the best way to look at it is Hadoop and relational databases are very complementary. NoSQL databases are an interesting area. Again, this is a technology that has been around for years years. It was originally called on the mainframe 30 or 40 years ago sort of index sequential access methods and then now they're called key value stores. And these technologies again have been used in conjunction with relational databases for many years. They're not really big data technologies in the sense that you can do analytics against them. They don't support SQL. That's why they're called NoSQL. So they're not really suitable for BI or analytics, but they are suitable for ingesting information just like Hadoop is good at ingesting information to a file system. NoSQL databases are also good for ingestion as part of this big data story. You can get information out of them, but they're like I said, they're just sort of key values. They're not really massively parallel analytic engines. So that opportunity is there, but it's sort of a smaller part of this whole big data space. And I think I'll leave it at that. There's hundreds of other vendors, but I think those are the 2 key main ones to look at here. Now one of the Hadoop distribution vendors that you guys have been working with is Cloudera, and you mentioned them in your presentation. Why did you choose Cloudera over a couple of the other distributions that are out there? Cloudera certainly is the largest and most mature of the Hadoop distributions out there. They also have a very large and mature support organization that sort of works with us in supporting our customers. And certainly at the time we chose them, they were I think they were the clear choice and they've chosen to be a good partner with us as we've gone to market around big data with them. So as we step back from all this, Andy, and you've kind of looked at this over years of experience on the database side, what do you see as the biggest barriers to adoption, both your technology and more generally for big data technology? Well, there are these 2 key platform parts of our platform here. There's Hadoop and there's relational database technology. On the relational database technology side, I think the barriers to adoption are pretty low these days. Like I said earlier, there's a huge ecosystem of people who know how to manage Oracle databases, they know how to write SQL queries against them and they know how to use tools that automatically generate SQL and their solutions and there's whole tech stacks worth of stuff there to make it very easy. So, on the other side of the world, however, on the Hadoop side of the world, there are significant barriers to adoption. I think, number 1, there isn't a lot of expertise in the IT organizations on how to run Hadoop, which is why I think our engineered system for Hadoop is the big data appliance is going to really resonate with our customers there to make it easy to access the just the initial deployment of the technology. The other big issue around Hadoop is, if you want to write analytics, you can start out saying, okay, I'm going to do some Java coding and write MapReduce using Java. Well, that is a skill set that is not very plentiful. So there aren't a lot of developers out there who know how to do that. So, they have started building some tools on top of Java MapReduce, things like Hive and Pig, that give you a little higher level programming paradigm. That's sort of like a simple subset of SQL. Again, that's good. It's a good start. Most of our customers will use those kind of tools rather than try to write with Java MapReduce and that helps a little bit. And I think moving forward, they're going to of course working to continue that. Oracle is working with extending our SQL engine capabilities against HDFS data as well to make it easier for customers to use the Hadoop platform. And then moving up the stack, again, there's not a lot of tooling or solutions that sit on the stack today on the Hadoop side. Again, that needs to really improve to make that much more easy to adopt part of the platform. Of course, at Oracle, we'll be working on those kind of solutions as well. So the last question for me, Andy, as you look at that, maybe you sort of answered it with this previous question, where's the biggest opportunity for Oracle in the whole big data market? Yes. I run the database group and we see this whole big data space as being a huge market opportunity for us. Over the years, traditional BI and data warehousing has been a huge part of our business. We see big data as being just an acceleration of that business. And certainly, our Exadata engineered system is really sort of leading the way there. All of the customers, I think, who have big Oracle data warehouses these days on non Exadata platforms as they refresh those platforms are moving to Exadata. So very aggressively in this space with our new in memory column store technology. So I think that's again going to sort of raise the bar of what's expected of a massively parallel SQL engine. That's going to be very hard for people in the open source space to keep pace with. So So I think the relational database and Exadata, of course, are huge opportunities for us. The big data appliance as customers start adopting Hadoop is going to be another opportunity for us that's significant moving forward. And in the BI space, of course, all our BI tools are moving into the in memory analytics space. Our Xolytics Engineered System is sort of the platform for doing that. That's another big opportunity for us. And then lastly, I did mention our Oracle NoSQL database. We do see a lot of interest in NoSQL, not necessarily just in what we call the big data space, but just in a general data processing space that a lot of web developers especially are very interested in using NoSQL databases. We have a very strong NoSQL offering and we're busy right now trying to make sure all of our enterprise customers know that if they are considering NoSQL, they should consider the Oracle NoSQL database product in their evaluation. And we think we're going to do very well as people do actual competitive POCs using our technology versus the other popular technologies out there for NoSQL. So I'd say those are the key opportunities for us in big data. Great. Well, that's plentiful. Andy, thanks so much for your time today. Really appreciate it. Shauna, I haven't had any questions come in. Are there any that you'd like to ask before we close things up? No. I think at this point, we'll go ahead and wrap up. We'd like to thank everyone for joining us today. Also, we'd like to extend a very special thank you to Brendan for moderating the Q and A portion of today's call and asking the questions most asked by investors. If you have any follow-up questions, please contact the Investor Relations team here at Oracle. This concludes our call. Ladies and gentlemen, thank you for participating in today's conference. This does conclude the program and you may all disconnect. Everyone have a great day.