Intel storage systems. Storage Systems - An Overview External Storage Systems

What is the purpose of data storage systems (DSS)?

Data storage systems are designed for safe and fault-tolerant storage of processed data with the ability to quickly restore access to data in the event of a system failure.

What are the main types of storage systems?

By the type of implementation, storage systems are divided into hardware and software. According to the field of application, storage systems are divided into individual, for small work groups, for work groups, for enterprises, corporate. By the type of connection, storage systems are divided into:

1. DAS (Direct Attached Storage - systems with a direct connection)

A feature of this type of systems is that control over access to data for devices connected to the network is carried out by the server or workstation to which the storage is connected.

2. NAS (Network Attached Storage)

V this type systems, access to information stored in the storage is controlled by software that runs in the storage itself.

3. SAN (Storage Attached Network - systems that represent a network between servers that process data and, in fact, storage systems);

With this method of building a data storage system, control over access to information is carried out by software running on storage servers. SAN switches are used to connect storage to servers using high-performance access protocols (Fiber channel, iSCSI, ATA over ethernet, etc.)

What are the features of the software and hardware implementation of storage systems?

The hardware implementation of a storage system is a single hardware complex consisting of a storage device (which is a disk or an array of disks on which data is physically stored) and a control device (a controller that distributes data between storage elements).

The software implementation of the storage system is distributed system, in which data is stored without being tied to any specific storage or server, and access to the data is carried out through specialized software, which is responsible for the safety and security of the stored data).

A data storage system (DSS) is a complex of software and hardware designed to manage and store large amounts of information. The main storage media at this time are hard disks, the volumes of which have recently reached 1 terabyte. The main storage of information in small companies are file servers and DBMS servers, the data of which is stored on local hard drives. In large companies, the amount of information can reach hundreds of terabytes, and even greater requirements for speed and reliability are imposed on them. No locally attached disk drives can meet these needs. That is why large companies are introducing data storage systems (DSS).

The main components of storage systems are: storage media, data management systems and data transmission networks.

Information carriers. As mentioned above, now the main storage media are hard disks (perhaps in the near future they will be replaced by solid-state electronic SSD drives). Hard drives are divided into 2 main types: reliable and efficient SAS (Serial Attached SCSI) and more economical SATA. In systems Reserve copy tape drives (streamers) are also used.
Data management systems. Storage provides powerful data management capabilities. The storage system provides functions for mirroring and replicating data between systems, supports fault-tolerant, self-healing arrays, provides monitoring functions, as well as backup functions at the hardware level.
Data transmission networks. Data networks provide a medium through which communication between servers and storage systems or communication of one storage system with another is carried out. Hard drives are divided by the type of connection: DAS (Direct Attached Storage) - drives directly attached to the server, NAS (Network Attached Storage) - drives connected over a network (data is accessed at the file level, usually via FTP, NFS or SMB) and SAN (Storage Area Network) - storage area networks (provide block access). In large storage systems, SAN is the primary connection type. There are 2 methods of building a SAN based on Fiber Channel and iSCSI. Fiber Channel (FC) is primarily used for interconnection within a single data center. ISCSI is a SCSI command-over-IP protocol that can be routed by regular IP routers. iSCSI allows you to build geo-distributed clusters.

Storage solution based on HP arrays and CISCO switches, data volume over 1 PB (1 petabyte).

The main manufacturers of devices used to build storage systems are HP, IBM, EMC, Dell, Sun Microsystems and NetApp. Cisco Systems offers a wide variety of Fiber Channel switches that provide connectivity between storage devices.

LanKey has extensive experience in building data storage systems based on equipment from the above manufacturers. When building storage systems, we cooperate with manufacturers and build high-performance and highly reliable information storage systems. Our engineers will design and implement storage systems that meet the specifics of your business, as well as develop a system for managing your data.

Direct Attached Storage Systems (DAS) implement the most well-known connection type. When using DAS, the server has a personal connection with the storage system and is almost always the sole user of the device. In this case, the server receives block access to the data storage system, that is, it accesses the data blocks directly.

Storage systems of this type are fairly simple and usually inexpensive. The disadvantage of the direct connection method is the small distance between the server and the storage device. The typical DAS interface is SAS.

Network Attached Storage (NAS)

Network-attached storage systems (NAS), also known as file servers, provide their network resources to clients over the network in the form of shared files or directory mount points. Clients use network file access protocols such as SMB (formerly known as CIFS) or NFS. The file server, in turn, uses block access protocols to its internal storage to process client requests for files. Since the NAS operates over a network, the storage can be very far away from the clients. A variety of network storage systems provide additional functions, such as taking storage images, deduplication or data compression, and others.

Storage Area Network (SAN)

A storage area network (SAN) provides clients with block-level access to data over a network (such as Fiber Channel or Ethernet). Devices on a SAN do not belong to a single server, but can be used by all clients on the storage network. It is possible to divide disk space into logical volumes that are allocated to separate host servers. These volumes are independent of the SAN components and their placement. Clients access the datastore using a block type of access just like a DAS connection, but since the SAN uses a network, the storage devices can be located far away from the clients.

Currently, SAN architectures use the Small Computer System Interface (SCSI) protocol to send and receive data. Fiber Channel (FC) SANs encapsulate the SCSI protocol in Fiber Channel frames. SANs using iSCSI (Internet SCSI) use SCSI TCP / IP packets as transport. Fiber Channel over Ethernet (FCoE) encapsulates Fiber Channel into Ethernet packets using the relatively new Data Center Bridging (DCB) technology, which brings a set of enhancements to traditional Ethernet and can now be deployed on 10GbE infrastructure. Because each of these technologies allows applications to access data storage using the same SCSI protocol, it becomes possible to use them all in one company or migrate from one technology to another. Applications running on the server cannot distinguish FC, FCoE, iSCSI, or even DAS from SAN.

There is a lot of discussion about the choice of FC versus iSCSI for building a SAN. Some companies are focusing on the low cost of initial iSCSI SAN deployments, while others are opting for high availability and high availability Fiber Channel SANs. Although low-end iSCSI solutions are less expensive than Fiber Channel, the cost advantage disappears as iSCSI SAN performance and reliability increase. At the same time, there are some FC implementations that are easier to use than most iSCSI solutions. Therefore, the choice of a particular technology depends on the business requirements, existing infrastructure, expertise and budget.

Most large organizations that use SANs choose Fiber Channel. These companies typically require proven technology, need high throughput, and have the budget to buy the most reliable, high performing hardware available. They also have staff to manage the SAN. Some of these companies plan to continue investing in Fiber Channel infrastructure, while others are investing in iSCSI solutions, especially 10GbE, for their virtualized servers.

Smaller companies are more likely to choose iSCSI because of the low cost entry barriers, while still being able to scale up their SANs further. Inexpensive solutions usually use 1GbE technology; 10GbE solutions are significantly more expensive and are generally not considered entry-level SANs.

Unified storage

Unified Storage combines NAS and SAN technologies in a single, integrated solution. These versatile storages allow you to use both block and file type access to shared resources, and it is easier to manage such devices with software that provides centralized management.

This article will focus on entry-level and mid-range storage systems and the trends that are emerging in the industry today. For convenience, we will call data storage systems drives.

First, we will dwell a little on the terminology and technological foundations of autonomous storage, and then move on to new products and discussion of modern advances in various technology and marketing groups. We will also be sure to tell you why you need systems of one type or another and how effective their use is in different situations.

Standalone disk subsystems

In order to better understand the features of autonomous drives, let's dwell a little on one of the simpler technologies for building data storage systems - bus-oriented technology. It provides for the use of a disk drive enclosure and a PCI RAID controller.

Figure 1. Bus-oriented storage technology

Thus, between the disks and the host PCI bus (from the English. Host- in this case, an autonomous computer, for example, a server or workstation) there is only one controller, which to a large extent determines the speed of the system. Drives built on this principle are the most productive. But due to architectural features, their practical use, with the exception of rare cases, is limited to single-host configurations.

The disadvantages of bus-oriented drive architecture include:

effective use only in single host configurations;
dependence operating system and platforms;
limited scalability;
limited opportunities for organizing fault-tolerant systems.

Naturally, all this does not matter if the data is needed for one server or workstation. On the contrary, in such a configuration you will get the maximum performance for the minimum money. But if you need a storage system for a large data center, or even two servers that need the same data, a bus-oriented architecture is completely inadequate. The disadvantages of this architecture are avoided by the architecture of the stand-alone disk subsystems. The basic principle of its construction is quite simple. The controller that controls the system is moved from the host computer to the drive enclosure, providing a host-independent operation. It should be noted that such a system can have a large number of external input / output channels, which makes it possible to connect several, or even many, computers to the system.

Figure 2. Standalone storage system

Any intelligent storage system consists of hardware and program code... In an autonomous system there is always memory, which stores the program of algorithms for the operation of the system itself and the processing elements that process this code. Such a system functions regardless of which host systems it is associated with. Thanks to their intelligence, stand-alone drives often independently implement many functions to ensure the safety and management of data. One of the most important basic and almost ubiquitous functions is RAID (Redundant Array of Independent Disks). Another, already belonging to the systems of middle and high level is virtualization. It provides such features as instant copy or remote backup, as well as other rather sophisticated algorithms.

Briefly about SAS, NAS, SAN

As part of the consideration of autonomous data storage systems, it is imperative to dwell on how host systems access drives. This largely determines the scope of their use and internal architecture.

There are three main options for organizing access to drives:

SAS (Server Attached Storage) - a drive attached to the server [the second name is DAS (Direct Attached Storage) - a directly attached drive];
NAS (Network Attached Storage) - a storage device connected to a network;
SAN (Storage Area Network) is a storage area network.

We have already written about SAS / DAS, NAS and SAN technologies in the article dedicated to SAN, if anyone is interested in this information, we recommend that you refer to the iXBT pages. But still, let us refresh the material a little with an emphasis on practical use.

SAS / DAS- This is a fairly simple traditional connection method, which implies direct (hence the DAS) connection of the storage system to one or more host systems through a high-speed channel interface. Often in such systems, the same interface is used to connect the drive to the host that is used to access the internal disks of the host system, which generally provides high performance and easy connection.

SAS-system can be recommended for use in case there is a need for high-speed processing of large amounts of data on one or more host systems. This, for example, can be a file server, a graphics station, or a failover cluster system consisting of two nodes.

Figure 3. Clustered system with shared storage

NAS- a drive that is connected to the network and provides file (note - file, not block) access to data for host systems on the LAN / WAN. Clients that work with NAS usually use NSF (Network File System) or CIFS (Common Internet File System). NAS interprets commands file protocols and executes a request to disk drives in accordance with the channel protocol used in it. In fact, NAS architecture is the evolution of file servers. The main advantage of such a solution is the speed of deployment and the quality of organizing access to files, due to specialization and narrow focus.

Based on the foregoing, NAS can be recommended for use if you need network access to files and sufficiently important factors are: ease of solution (which is usually a kind of quality guarantor) and ease of maintenance and installation... A great example of this is when a NAS is used as a file server in a small company office where ease of installation and administration is important. But at the same time, if you need access to files from a large number of host systems, a powerful NAS drive, thanks to a sophisticated specialized solution, is able to provide intensive traffic exchange with a huge pool of servers and workstations at a fairly low cost of the used communication infrastructure (for example , Gigabit Ethernet and copper twisted pair switches).

SAN- data storage network. SANs typically use block data access, although storage networks can be connected to devices that provide file services, such as NAS. In modern implementations of storage networks, the Fiber Channel protocol is most often used, but in general this is not required, and therefore, it is customary to allocate a separate class of Fiber Channel SANs (storage area networks based on Fiber Channel).

The SAN is based on a network separate from the LAN / WAN, which serves to organize access to data from servers and workstations directly involved in processing. This structure makes it relatively easy to build high availability, high demand systems. While SANs remain an expensive proposition today, the TCO (total cost of ownership) for medium to large systems built using SAN technology is quite low. For a description of ways to reduce the TCO of enterprise storage with SANs, see the techTarget resource pages: http://searchstorage.techtarget.com.

Today, the cost of disk drives with Fiber Channel support, as the most common interface for building SANs, is close to the cost of systems with traditional low-cost channel interfaces (such as parallel SCSI). The main cost components in a SAN remain communication infrastructure, as well as the cost of its deployment and maintenance. In this connection, within the framework of SNIA and many commercial organizations, active work over IP Storage technologies, which allows the use of much more inexpensive equipment and infrastructure of IP networks, as well as the colossal experience of specialists in this area.

There are many examples of effective use of SAN. A SAN can be used almost everywhere where there is a need to use multiple servers with a shared storage system. For example, for organizing teamwork on video data or pre-processing of printed products. In such a network, each participant in the digital content processing process gets the opportunity to work almost simultaneously on Terabytes of data. Or, for example, organizing backups of large amounts of data that are used by many servers. When building a SAN and using a LAN / WAN-independent data backup algorithm and "snapshot" technologies, almost any amount of information can be backed up without compromising the functionality and performance of the entire information complex.

Fiber Channel in SANs

It is an undeniable fact that today it is FC (Fiber Channel) that dominates storage networks. And it was the development of this interface that led to the development of the SAN concept itself.

Experts with significant experience in the development of both channel and network interfaces took part in the design of the FC, and they managed to combine all the important positive features of both directions. One of the most important advantages of Fiber Channel, along with speed parameters (which, by the way, are not always the main ones for SAN users, and can be implemented using other technologies) is the ability to work over long distances and topology flexibility, which came to the new standard from network technologies ... Thus, the concept of building a storage network topology is based on the same principles as traditional local area networks, based on hubs, switches and routers, which greatly simplifies the construction of multi-node system configurations, including without a single point of failure.

It is also worth noting that Fiber Channel uses both fiber and copper media for data transmission. When organizing access to geographically remote sites at a distance of up to 10 kilometers, standard equipment and single-mode fiber are used for signal transmission. If the nodes are separated by 10 or even 100 kilometers, special amplifiers are used. When building such SANs, parameters that are rather unconventional for data storage systems are taken into account, for example, the speed of signal propagation in fiber.

Storage Trends

The storage world is extremely diverse. The capabilities of data storage systems and the cost of solutions are quite differentiated. There are solutions that combine the capabilities of serving hundreds of thousands of requests per second to tens and even hundreds of Terabytes of data, as well as solutions for one computer with inexpensive IDE disks.

IDE RAID

V Lately the maximum capacity of IDE drives has grown enormously and outstrips SCSI drives by about two times, and if we talk about the price per unit volume ratio, IDE drives are in the lead with a gap of more than 6 times. This, unfortunately, did not positively affect the reliability of IDE disks, but nevertheless the scope of their use in stand-alone data storage systems is inexorably increasing. The main factor in this process is that the demand for large amounts of data is growing faster than the volume of single disks.

A few years ago, rare manufacturers decided to release stand-alone subsystems focused on using IDE disks. Today they are produced by almost every manufacturer focused on the entry-level system market. The most widespread in the class of stand-alone subsystems with IDE-disks is observed in the entry-level NAS systems. After all, if you use a NAS as a file server with a Fast Ethernet interface or even Gigabit Ethernet, then in most cases the performance of such disks is more than sufficient, and their low reliability is compensated by the use of RAID technology.

Where block access to data is required at the lowest price per unit of stored information, today systems with IDE disks inside and with an external SCSI interface are actively used. For example, on the JetStor IDE system manufactured by the American company AC&NC for building a fault-tolerant archive with a storage volume of 10 Terabytes and the possibility of fast block access to data, the cost of storing one megabyte will be less than 0.3 cents.

Another interesting and rather original technology that I had to get acquainted with quite recently was the Raidsonic SR-2000 system with an external parallel IDE interface.

Figure 4. Entry-level stand-alone IDE RAID

It is a stand-alone disk system designed to use two IDE disks and is designed to be mounted inside a host system enclosure. It is completely independent of the operating system on the host machine. The system allows you to organize RAID 1 (mirror) or simply copy data from one disk to another with hot-swappable disks, without any damage or inconvenience on the part of the computer user, which cannot be said about bus-oriented subsystems built on PCI IDE RAID controllers ...

It should be noted that the leading manufacturers of IDE drives have announced the release of mid-range drives with Serial ATA interface, which will use high-level technologies. This should positively affect their reliability and increase the share of ATA solutions in data storage systems.

What Serial ATA will bring us

The first and most pleasant thing you can find in Serial ATA is the cable. Due to the fact that the ATA interface became serial, the cable became round and the connector narrower. If you've had to route IDE parallel cables across eight IDE channels on your system, I'm sure you'll love this feature. Of course, round IDE cables have existed for a long time, but their connector still remained wide and flat, and the maximum allowable length of a parallel ATA cable is not encouraging. When building systems with a large number of disks, the presence of a standard cable does not help much at all, since the cables have to be made independently, and at the same time their laying becomes almost the main task in time during assembly.

In addition to the peculiarities of the cable system, Serial ATA has other innovations that cannot be implemented for the parallel version of the interface on your own using a clerical knife or other handy tool. Disks with the new interface should soon support the Native Command Queuing instruction set. With Native Command Queuing, the Serial ATA controller analyzes I / O requests and optimizes the order of execution to minimize seek time. The similarity of the idea of Serial ATA Native Command Queuing with the organization of command queuing in SCSI is quite obvious, however, for Serial ATA up to 32 commands will be supported, and not the traditional for SCSI - 256. Native support for hot swapping of devices has also appeared. Of course, such a possibility existed before, but its implementation was outside the scope of the standard and, accordingly, could not be widely used. Speaking about the new high-speed capabilities of Serial ATA, it should be noted that now they are not very happy about them, but the main thing here is that there is a good Roadmap for the future, which would be very difficult to implement within the framework of parallel ATA.

Considering the above, there is no doubt that the share of ATA solutions in entry-level storage systems should increase precisely due to the new Serial ATA drives and storage systems focused on the use of such devices.

Where Parallel SCSI Goes

Anyone who works with storage systems, even entry-level ones, can hardly say that they like systems with IDE disks. The main advantage of ATA disks is their low price, in comparison with SCSI devices, and also, probably more low level noise. And all this happens for a simple reason, since the SCSI interface is better suited for use in storage systems and while it is much cheaper than the even more functional interface - Fiber Channel, then disks with a SCSI interface are produced of better quality, more reliable and faster. than with a cheap IDE interface.

Many manufacturers today use Ultra 320 SCSI when designing parallel SCSI storage systems, the most new interface in the family. Once in many Roadmaps there were plans to release devices with an Ultra 640 and even Ultra 1280 SCSI interface, but everything went to the fact that something needed to be radically changed in the interface. Already now, at the stage of using the Ultra 320, parallel SCSI does not suit many, mainly due to the inconvenience of using classic cables.

Fortunately, a new Serial Attached SCSI (SAS) interface has recently been introduced. The new standard will have interesting features. It combines some of the capabilities of Serial ATA and Fiber Channel. Despite this oddity, it should be said that there is some common sense in such an interweaving. The standard originated from the physical and electrical specifications of serial ATA, with improvements such as increasing signal levels to suitably increase cable lengths, and increasing the maximum addressability of devices. And the most interesting thing is that the technologists promise to provide compatibility of Serial ATA and SAS devices, but only in the next versions of the standards.

The most important features of SAS include:

point-to-point interface;
two-channel interface;
support for 4096 devices in the domain;
standard set of SCSI commands;
cable up to 10 meters long;
4-core cable;
full duplex.

Due to the fact that the new interface offers the use of the same miniature connector as Serial ATA, developers have new opportunity to build more compact devices with high performance. The SAS standard also provides for the use of expanders. Each expander will support 64-device addressing with the ability to cascade up to 4096 devices within a domain. This is certainly significantly less than the capabilities of Fiber Channel, but for entry-level and mid-range storage systems, with drives directly attached to the server, this is sufficient.

For all its delights, Serial Attached SCSI is unlikely to quickly replace the conventional parallel interface. In the world of enterprise solutions, development is usually more rigorous and naturally takes longer than desktop development. Yes, and old technologies do not go away very quickly, since the period for which they work out is also rather long. Still, in the year 2004, devices with SAS interface should enter the market. Naturally, at first it will be mainly disks and PCI controllers, but in a year or so the data storage systems will catch up.

For a better generalization of information, we suggest that you familiarize yourself with a comparison of modern and new interfaces for data storage systems in the form of a table.

1 - The standard regulates a distance of up to 10 km for single-mode fiber; there are implementations of devices for transmitting data over a distance of more than 105 m.
2 - Hubs and some FC switches operate within the internal virtual ring topology, there are also many implementations of switches that provide point-to-point connectivity of any devices connected to them.
3 - There are implementations of devices with SCSI, FICON, ESCON, TCP / I, HIPPI, VI protocols.
4 - The fact is that the devices will be mutually compatible (this is what the manufacturers promise to do in the near future). That is, SATA controllers will support SAS drives, and SAS controllers will support SATA drives.

Mass NAS craze

Recently, there has been a massive fascination with NAS drives abroad. The point is that with the increasing relevance of a data-driven approach to building information systems the attractiveness of the classic file server specialization increased and the formation of a new marketing unit - NAS. At the same time, the experience in building such systems was sufficient for quick start storage technologies connected to the network, and the cost of their hardware implementation was extremely low. Today, NAS drives are produced by virtually all manufacturers of storage systems, including entry-level systems for very little money, and medium-sized ones, and even systems responsible for storing tens of Terabytes of information, capable of processing a colossal number of requests. Each class of NAS systems has its own interesting original solutions.

PC based NAS in 30 minutes

We want to describe a little one original entry-level solution. One can argue about the practical value of its implementation, but it cannot be denied originality.

In fact, an entry-level NAS drive, and not only an entry-level one, is a fairly simple personal computer with a certain number of disks and software part, which provides other network members with access to data at the file level. Thus, to build a NAS device, it is enough to take these components and connect them together. The whole point is how well you do it, just as reliable and high-quality access to data will be received by the working group working with the data that your device provides access to. It is taking into account these factors, as well as the deployment time of the solution, plus some design research, an entry-level NAS drive is being built.

The difference between a good entry-level NAS solution with self-assembled and customized staff within the chosen OS, if we again omit the design, will be:

how quickly you will do it;
how easy this system can be maintained by unqualified personnel;
how well this solution will work and be supported.

In other words, in the case of a professional selection of components and the existence of a certain initially configured set software, you can achieve a good result. The truth seems to be banal, the same can be said about any task that is solved according to the scheme of ready-made component solutions: "hardware" plus "software".

What does Company X propose to do? A rather limited list of compatible components is being formed: motherboards with all integrated facilities, required for an entry-level NAS server of hard drives. You buy a plug-in IDE connector on motherboard FLASH disk with recorded software and you get a ready-made NAS drive. The operating system and utilities written to this disk, when booting, configure the necessary modules in an adequate way. As a result, the user gets a device that can be controlled both locally and remotely via an HTML interface and provide access to disk drives connected to it.

File protocols in modern NAS

CIFS (Common Internet File System) is a standard protocol that provides access to files and services on remote computers (including the Internet). The protocol uses a client-server interaction model. The client makes a request to the server to access files or send a message to a program that resides on the server. The server fulfills the client's request and returns the result of its work. CIFS is an open standard that arose on the basis of the Server Message Block Protocol (SMB) developed by Microsoft, but, unlike the latter, CIFS takes into account the possibility of long timeouts, as it is oriented towards use in distributed networks as well. The SMB protocol has traditionally been used in local area networks with Windows for file access and printing. CIFS uses the TCP / IP protocol to transport data. CIFS provides functionality similar to FTP (File Transfer Protocol), but provides clients with improved (direct-like) control over files. It also allows you to share access to files between clients, using blocking and automatic restoration of communication with the server in the event of a network failure.

NFS (Network File System) is an IETF standard that includes Distributed File System and Networking Protocol. NFS was developed by Sun Microsystem Computer Corporation. It was originally used only on UNIX systems, later implementations of client and server chat became common on other systems.

NFS, like CIFS, uses a client-server communication model. It provides access to files on remote computer(server) for writing and reading as if they were on the user's computer. Earlier versions of NFS used UDP for transporting data, while modern versions use TCP / IP. For NFS to work on the Internet, Sun has developed the WebNFS protocol, which uses extensions to the functionality of NFS for its correct work in the worldwide network.

DAFS (Direct Access File System) is a standard file access protocol based on NFSv4. It allows applications to transfer data bypassing the operating system and its buffer space directly to transport resources, preserving the semantics inherent in file systems... DAFS takes advantage of the latest technologies data transfer according to the memory-memory scheme. Its use provides high file I / O speeds, minimal CPU and system load due to a significant reduction in the number of operations and interrupts that are usually required when processing network protocols. It is especially effective to use hardware support for the VI (Virtual Interface).

DAFS has been designed with a cluster and server environment in mind for databases and a variety of end-to-end Internet applications. It provides the lowest latency in accessing file shares and data, and also supports intelligent system and data recovery mechanisms, which makes it very attractive for use in high-end NAS-drives.

All roads lead to IP Storage

There are many exciting new technologies that have emerged in high- and mid-range storage systems over the past few years.

Fiber Channel SAN is a well-known and popular technology today. At the same time, their mass distribution today is problematic due to a number of features. These include the high cost of implementation and the complexity of building geographically distributed systems. On the one hand, these are just the features of enterprise-level technology, but on the other hand, if the SAN becomes cheaper and the construction of distributed systems becomes easier, this should simply provide a colossal breakthrough in the development of storage networks.

As part of work on network storage technologies in the Internet Engineering Task Force (IETF), a working group and an IP Storage (IPS) forum was created in the following areas:

FCIP - Fiber Channel over TCP / IP, a tunneling protocol based on TCP / IP, the function of which is to connect geographically distant FC SANs without any impact on the FC and IP protocols.

iFCP - Internet Fiber Channel Protocol, created on the basis of TCP / IP protocol for connecting FC storage systems or FC storage networks, using IP infrastructure together or instead of FC switching and routing elements.

iSNS - Internet Storage Name Service, a protocol for supporting the names of drives on the Internet.

iSCSI - Internet Small Computer Systems Interface, is a protocol that is based on TCP / IP and is designed to communicate and manage storage systems, servers and clients (Definition of SNIA - IP Storage Forum:).

The most rapidly developing and most interesting of the listed areas is iSCSI.

iSCSI is the new standard

On February 11, 2003, iSCSI became the official standard. Ratification of iSCSI is bound to influence broader interest in the standard, which is already developing quite actively. The fastest development of iSCSI will serve as an impetus for the spread of SAN in small and medium-sized businesses, since the use of a standard-compliant equipment and approach to service (including widespread within the framework of standard Ethernet networks) will make SANs significantly cheaper. As for the use of iSCSI on the Internet, today FCIP has already taken root here, and competition with it will be difficult.

Well-known IT companies willingly supported the new standard. There are, of course, opponents, but nevertheless, almost all companies that actively participate in the market of entry and mid-level systems are already working on devices with iSCSI support. In Windows and Linux iSCSI drivers are already included, iSCSI storage systems are produced by IBM, adapters - by Intel, in the near future HP, Dell, EMC promise to join the process of mastering the new standard.

One of the very interesting features of iSCSI is that to transfer data on an iSCSI drive, you can use not only media, switches and routers of existing LAN / WAN networks, but also ordinary network adapters Fast Ethernet or Gigabit Ethernet on the client side. However, this creates a significant overhead for the processing power of a PC that uses such an adapter. According to the developers, the software implementation of iSCSI can achieve the speeds of the Gigabit Ethernet data transmission medium with a significant, up to 100% load of modern CPUs. In this connection, it is recommended to use special network cards that will support mechanisms to offload the CPU from processing the TCP stack.

SAN virtualization

One more important technology in the construction of modern storage devices and storage networks is virtualization.

Storage virtualization is the presentation of physical resources in a logical, more convenient way. This technology allows flexible allocation of resources among users and efficient management of them. Within the framework of virtualization, remote copying, snapshot, distribution of I / O requests to the most suitable drives for the nature of service, and many other algorithms are successfully implemented. The implementation of virtualization algorithms can be carried out both by means of the drive itself, and with the help of external virtualization devices, or with the help of control servers running specialized software under standard operating systems.

This, of course, is a very small part of what can be said about virtualization. This topic is very interesting and extensive, so we decided to devote a separate publication to it.