News on Monday
more>>
SharePoint Tech Report
more>>


   

 
 
Download Current Issue
ISSUE 2/1/2010 PDF

Need Back Issues?
DOWNLOAD HERE

Receive the print Edition?


 
blogs tab
Visual Studio 2010 Release Candidate Available Today
A Visual Studio 2010 release candidate is available on MSDN.
02/09/2010 09:45 AM EST

Is Microsoft eyeing Office subscription pricing?
Microsoft may be preparing to offer a new Office pricing option called "union," which charges the same for cloud as on-premises.
02/01/2010 09:38 AM EST

Facebook rewrites PHP runtime
Facebook is about to open source its own PHP runtime, written from scratch for speed.
01/30/2010 08:53 PM EST

 

Events calendar tab
2/9/2010 to 2/13/2010
San Francisco
IDG World Expo

2/10/2010 to 2/12/2010
San Francisco
BZ Media

2/17/2010 to 2/25/2010
Atlanta
Python Software Foundation

2/19/2010 to 2/20/2010
Los Angeles
SCALE

2/21/2010 to 2/24/2010
Las Vegas
IBM


 
Most Read Latest News Blog Resources

Sea Change In Database Management


Virtualization, mobile access, richer data in varied formats pose new challenges



May 15, 2007 — 
As if it were an octopus, the enterprise database management system has evolved the ability to embrace a variety of data and technologies, as customers and vendors alike recognize the importance of the DBMS to the corporate ecosystem. But it’s no swim in the garden for either party.

The complications facing enterprise database management systems are not trivial. The amount of data that end users maintain is growing from a variety of pressures. In general, businesses are keeping richer forms of data, in the form of audio, images and video. New data formats, and structured XML in particular, are entering the scene. Behind everything else are the pressures from a new scrutiny of how

businesses maintain the data upon which they live. Governance concerns that require companies to retain e-mail and other communications in an accessible format are increasing the burden on database systems and their managers, and things are likely to get worse before they get better.

Other technologies are further muddying the waters: Increasingly, data is mobile, in the form of intelligent devices that are only occasionally connected to a network. Virtualization of both data and systems is being implemented, or at least talked about, even if the realities fall short of the ambitions.

Gartner Distinguished Analyst and vice president Donald Feinberg, a 41-year veteran of the IT industry and former director of mainframe product marketing at Oracle who has covered data management for Gartner for 17 years, told SD Times that issues of governance, mobility, scalability and stability are driving the evolution of the enterprise DBMS.

Scalability is perhaps the most obvious problem facing DBMS customers and vendors. “There are two issues around scaling,” said Feinberg. “There’s the size of the database, which needs to be scalable to hundreds of terabytes, because a lot of this stuff takes up a lot of space. Take a medical history file [as an example]. You start storing EEGs and EKGs, and X-rays into a patient information database, [and] it’s going to take up terabytes of data. The second thing is, that [the DBMS] needs to give you the proper facilities to manage databases of those sizes.”

Feinberg pointed out that some of the problem is simply managing the hardware: “When you start to get over a few, 10, 15 terabytes, it becomes a nightmare for the storage people, [some of whom have] to manage storage farms that are a hundred terabytes. I talked to one guy the other day that had 4,000 LUNs [virtual disk volumes], because of the suggested logical size of the unit within this thing they were supposed to do, and he had to manage [these] outside of the DBMS. It’s a nightmare.”

The vendors, Feinberg believes, understand the stakes. “What they’re doing, is trying to make the databases you create with their DBMSes scalable to whatever size you need, manageable easily.”

Of course, one can’t dump data into a warehouse forever, Feinberg noted. “The next step, of course, is: When do we take data out of these operational systems? Whether they are OLTP [online transaction processing] systems, or whether they are operational data warehouses, they can’t just keep growing forever. We have to be able to retire data on an appropriate basis out of an OLTP [system] into a data warehouse, and from a data warehouse into some kind of archive,” whether it be online, near-line or offline.

“One way or another,” Feinberg continued, “we have to be able to reduce the sizes of these operational systems, and I do use the word operational to include data warehouses, because they are.”

EMBRACING NEW FORMATS
Feinberg also believes the major vendors are successfully addressing the other major challenge: the explosion in rich data. “They’re architecting into the database the capability to store many different types of data, efficiently.” Perhaps the most useful development in DBMS data formats to Feinberg has been the adoption of native XML storage.

“XML is probably more structured than relational [data], if you really want to get down to it,” he argued. “XQuery can run very quickly and efficiently against it; it means that you can search it as part of the database, but you can also take the XML document out in whole, without having to store it twice [as with other data]. [Otherwise,] you stripped it, stuffed it into [a relational database] and then kept a copy in a BLOB [binary large object file] so that you could see what it looked like as a document. Now, they’re storing it natively, so you can effectively do both” in the same step.

He continued, “You can store a lot of…rich media types, things like JPEG and photographs and music and voicemail, that being the one that’s the real killer [app] out there. I think the auditors are going to start insisting that people start keeping voicemail, recorded, and forever, as they always do. That’s going to create another issue. Those are types of data [where], although they have [a great deal of] structure to a program like Photoshop, they have zero structure to a program like Oracle.”

Governance concerns and record retention requirements are also driving the need for efficient, scalable storage, according to Feinberg. “Eventually for corporations, they’re going to want to [retain] all of their information, for several reasons. Anything in a corporation that’s worth keeping means that we need to have security and governance around it, automatically.”

There should be no such thing as “worthless data,” Feinberg pointed out. “If you’re going to sit there and tell me that the document you’re going to put into the data warehouse has no value, then my question to you is: ‘Why are you keeping the piece of paper in the first place, let alone putting it onto a disk?’ Get rid of it.”

Feinberg discussed the usefulness of parking code inside a database, in the form of stored procedures, scripts and so forth.

He noted, “We go through a period of five, 10 years where everybody wants to do that, then we go through a period of five or 10 years where people say, ‘If it’s for an internal application, I’m probably OK, but if it’s to sell, then I’m really not OK, because I [will] need to rewrite my stored procedures, in everybody’s different stored procedure language.’ So, for instance, you won’t find SAP using stored procedures, because they can’t easily support multiple database engines if they have to recode everything in T-SQL and PL/SQL and IBM SQL, etc. With Java, my understanding is that you move the logic out of the database and into the programs again. We’re back to more of what you would call a fat client,” with that model.

Feinberg continued, “There probably are some things that should be stored in the database, that help with referential integrity by storing it there. If a piece of code is always going to go with this one piece of data, and that gives us some type of referential integrity of the data, then maybe you want to do that, even if it means [software vendors are] writing four versions of it. There are many others that you want to keep in the programs, because the more you raise up to the middleware or the programming level, the more flexible and portable the application becomes. That’s an age-old argument, and it’s not an easy answer, because it really depends on the architecture of the application.”

LOOKING AHEAD
Feinberg believes that ultimately, many OLTP databases will give way to a model where most if not all of the data is warehoused. “If you want to look at the architecture of the future, bring in the operational data warehouse. They are beginning to be used to store operational data as well as the traditional [uses]. This is very important when it comes to designing the database structure, when it comes to the DBMS vendor, and the tuning, and the other things they have to do architecturally to support this.”

He added, “The architecture of applications is going to be [such] that they’re going to get some of their operational data out of the OLTP or the transaction databases, but some of it is going to come out of the data warehouse. And as more and more of that happens, as you get BI and analytic code in OLTP applications that access data warehouses, the warehouse becomes much more critical.”

Mobile applications and radio-frequency identification (RFID) are going to add dramatically to the amount of data companies are processing, Feinberg believes, and that means putting data straight into the warehouse where it can be used most efficiently.

Using a supermarket as a model, “all the historical data that we need is put into the data warehouse. When RFID starts working for real, and becomes cheap like it’s supposed to, and every single item on a supermarket shelf has an RFID chip on it,” that will solve the problem of inventories that are never quite up-to-date, he claimed.

Feinberg explained, “Ten minutes after they take the inventory, or less, [it’s out of date. But with RFID,] I can push a button and get an accurate inventory of every item on the shelf that I want to order right now, because it’s actually reading the RFID chip in every store across the world, and sending [data] back to me, I know exactly how many I really have on the shelf. That’s my inventory database. Now, where do I go to get the price to buy it? I go into the vendor’s computer; I go into his data warehouse and pull out my pricing history. I create a purchase order and send it off to the vendor. The purchase order’s history, so it goes into the data warehouse. I never used an operational database—unless you define an RFID chip as a database of one, which, in fact, is what it is.”

KEEPING IT REAL
Hardware virtualization has a long way to go before Feinberg sees it being truly useful to the DBMS community. “I haven’t had 10 conversations about virtualization,” he said. “The 10 I’ve had have either been ‘When do you think it will be ready for use in the run-of-the-mill-type DBMS infrastructure?’ or it’s been ‘What do I do to manage Oracle, because they won’t let me use VMware?’ and that’s it.”

Feinberg believes that will change, but not for several years. “If I had to position a dot on a virtualization Magic Quadrant—a ‘hype cycle’—it would be way up by the peak of ‘Inflated Expectations’ if you’re running DBMSes. I think we’re way far away from having the right software to manage it, having the real cost savings. I just don’t see a lot of people wanting to do it [today].”

But VMware-style virtualization isn’t the only game in town, he noted. “Oracle RAC [Real Application Clusters] is a different story. What Oracle’s built is an enterprise grid. I’m also not talking about mainframes here. The IBM System z is one of the best virtualizations in the world, and yes, people use that.”

One of the big architectural challenges, Feinberg added, is that “all of these vendors are going to have to run their DBMSes on true virtualized servers, where they have some level of understanding of the maturity of the virtualization, so that they know when the bug is being caused by [the DBMS vendor] and when the bug is being caused by the virtualizer.”

A related issue is clustering, “because today, Oracle’s the only one that does it well. Microsoft and Sybase and IBM are all working on that, because they have to get that working,” he said.

Data virtualization is a tougher problem for Feinberg. “Two words scare me in this industry: One of them is ‘virtual,’ the other is ‘emulation.’ Anytime I hear either of those two words, I know performance is going to be sacrificed.” But one can use a tool such as IBM’s WebSphere Integrator, he noted, “to bring together data virtually into a SQL statement. It can go to relational sources, nonrelational sources, etc., [as] a virtualized query. The problem is, anytime you put the word virtual into something like that, it means there’s going to be a performance degradation. There has to be: I’m mapping nonrelational data to relational data; I’m bringing data into a SQL query from God-knows-where.

“The latest technology idea,” Feinberg continued, “is that we combine something like Google into the virtual search for data in a data warehouse. That’s coming. [Federation] is valid, because a data warehouse query may need to get data from nonrelational structures, and if that’s the case, a virtual query, a federated query, is the only way you can do it. Sybase has products that [do] it, IBM has products that do it, [and] a number of companies are doing federated or virtualized data queries.”

In short, if the future won’t exactly be one of monster data warehouses with tentacles reaching into mobile devices and RFID tags, and database administrators armed with the virtualization equivalent of spearguns, it nevertheless will have its own challenges. Fortunately, the DBMS vendors appear equal to the challenge.


Share this link: http://www.sdtimes.com/link/30621
 

Add comment


Name*
Email*  
Country     


  • Comment
  • Preview
Loading