Hadoop:
Key Point
- Focus more on Hadoop analytics—and less on downtime maintenance.
- Run Hadoop in place on FAS (NFS) data—without moving or copying data.
- Choose FlexPod for pre-configured, enterprise-class Hadoop infrastructure.
- Complete Hadoop jobs faster—with higher throughput, using less capacity.
- Combine the power of Hadoop with robust data management.
Accelerate insights from all your data—with validated NetApp storage solutions for Hadoop.
Gaining insights from Hadoop—especially those critical for meeting tight enterprise SLAs—demands a flexible, highly available, and scalable storage platform. We designed NetApp solutions for Hadoop to build and simplify a more scalable Hadoop deployment—getting you to results faster.
The core NSH solution (PDF) is validated with NetApp E-Series storage, and two CVDs for the FlexPod Select with Hadoop solution—one each for Cloudera and Hortonworks—provide reference architectures with Cisco hardware to reduce your time and risk while building a Hadoop cluster. NetApp solutions for Hadoop (NSH) and NFS Connector can serve as a foundation for your data lake.
Storage Solution for Hadoop
NetApp Storage Solution for Hadoop delivers ready-to-deploy, flexible Hadoop clusters for handling big-data analytics.
NetApp Storage Solution for Hadoop provides a ready-to-deploy, enterprise-class infrastructure for the Hadoop platform—so you can control and gain insights from big data. Validated reference architectures deliver reliable Hadoop clusters and seamless integration of Hadoop with existing infrastructures. NetApp Hadoop solutions can reduce Hadoop cluster downtime and lower both storage and operating expenses.
An open ecosystem with best-of-breed components delivers a comprehensive Hadoop stack to address big data analytic challenges for any kind of structured or unstructured data.
There are two main types of Hadoop reference designs available for deployment:
- NetApp Storage Solution for Hadoop
For businesses that already have their own servers and networking and need branded, enterprise class storage validated with a Hadoop distribution.
- FlexPod Select with Hadoop
Joint reference architecture featuring pre-sized, enterprise-class components for Hadoop; validated on NetApp storage and Cisco UCS servers, with Cloudera and Hortonworks distributions of Hadoop.
NFS Connector for Hadoop
Run big data analytics on existing data stored on NFS-based systems: NetApp NFS Connector for Hadoop.
Employ NetApp NFS Connector for Hadoop to run big data analytics on NFSv3 data—without moving the data, creating a separate analytics silo, or setting up a Hadoop cluster. You can start analyzing existing data with Hadoop right away. Your IT staff can support Hadoop with ease. Workflows are simplified because you don’t have to copy and manage data across silos.
Leverage NFS Connector to run proof-of-concept, then set up a Hadoop cluster using NetApp Solutions for Hadoop for data from external sources.
NFS Connector lets you swap-out of Hadoop Distributed Filesystem (HDFS) for NFS or run NFS alongside HDFS. NFS Connector works with MapReduce for compute or processing and supports other Apache projects, including HBase (columnar database) and Spark (processing engine compatible with Hadoop). These capabilities let NFS Connector support diverse workloads—including batch, in-memory, streaming, and more.
FlexPod Select for Hadoop Solutions
Speed your time to insights. Rely on pretested NetApp and Cisco configurations for dedicated big-data analytic workloads.
NetApp FlexPod Select solutions are built for your dedicated, big-data workloads like Hadoop. These preconfigured, scalable solutions combine storage, networking, and servers—all validated with analytical applications.
You'll speed deployment. Reduce risk. And accelerate time-to-value for all your data.
FlexPod Select for Hadoop components include:
- NetApp E-Series storage systems
- Cisco Unified Computing System servers and fabric interconnect
- Cloudera Distribution of Hadoop (CDH) or Hortonworks Data Platform (HDP)
NetApp works with Cisco for infrastructure support and Cloudera or Hortonworks for Hadoop support. You get access to expert technical support on the full range of interoperable technologies involved.
FlexPod Select Reference Architectures
Validated configurations FlexPod Select with Cloudera's Distribution including Apache Hadoop (CDH) and FlexPod Select with Hortonworks Data Platform (HDP) are designed for high-availability, enterprise-class Hadoop environments on the FlexPod Select architecture. Both reference architectures are Cisco Validated Designs.
NoSQL Database:
Key Point
- Achieve consistent high performance.
- Access your NoSQL data more frequently.
- Scale with ease with CPUs and storage decoupled.
- Consume fewer servers and less capacity to lower your cost of ownership.
- Choose your high-performing all-flash solution: EF-Series or All Flash FAS.
Leverage NetApp all-flash storage for NoSQL to gain the performance and reliability you need to handle new types of data.
Extracting value from NoSQL databases in the datacenter is critical to your mission. It demands anenterprise-class storage platform that's fast, reliable, and flexible .
NetApp solutions deliver high performance, resilience, and scale for NoSQL databases—you can query your unstructured big data, or start handling data from sensors and machines (Internet of Things). Shorter time for rebuilds and consistent performance—even during failure—ensure adherence to tight SLAs and even tighter customer requirements.
The NoSQL databases currently validated are Couchbase and MongoDB. Other NoSQL databases NetApp storage supports include Cassandra, HBase, and MarkLogic.
NetApp storage solutions are stronger and more comprehensive than traditional servers for NoSQL databases. Couchbase runs on NetApp E-Series, while MongoDB runs on NetApp E-Series and All Flash FAS arrays.
Couchbase Server:
Key Point
- Optimized for high-performance, low-latency EF-Series all-flash arrays
- Faster recovery and minimal performance impact during failure mode
- Flexible storage scalability on the fly to support more workloads
- Simplified storage setup with innovative Dynamic Disk Pools
- Zero downtime during management tasks
Deploy Couchbase on NetApp EF-Series for fast, available, flexible storage to handle mission-critical NoSQL applications.
A fast, flexible database like Couchbase demands an equally fast and flexible, enterprise-class storage platform to accommodate the rising adoption of NoSQL. NetApp EF-Series storage for Couchbase NoSQL delivers the low latency you need for real-time big-data or Internet-of-Things use cases.
Expect high availability, simplified setup, data encryption, and tremendous flexibility: You can scale the compute independently of storage—saving operations costs—and mix different drive types for performance or capacity workloads.
NetApp delivers shorter rebuild times, consistent high performance, greater flexibility, and superior ease of management versus traditional servers for running Couchbase in production.
Splunk:
Key Point
- Boost uptime to run Splunk applications with high-availability E-Series.
- Accelerate searches—up to 104% on static, 34% on streaming.
- Choose from two validated reference architectures.
- Gain superior modular flexibility to address a wide variety of workloads.
- Use reference architectures with Splunk dashboards to monitor your cluster.
Gain high performance with modular flexibility for Splunk, coupled with NetApp monitoring applications and dashboards.
Realizing top performance from Splunk—especially for fast ingesting and searching of data—requires a corresponding fast, available, scalable storage platform. NetApp solutions for Splunk enable faster Splunk searches while making Splunk deployment simpler, easier, and more scalable.
NetApp NFS for OpenStack object storage relieves the integration burden on IT departments deploying cloud services. Two reference architectures—Splunk for NetApp with Cisco servers and Splunk for NetApp with conventional servers—deliver the performance, flexibility in storage tiering, and application availability Splunk needs to help you meet tight enterprise SLAs.
Splunk Apps complement the reference architectures to monitor NetApp storage: the SANtricity Performance App and the Technology Add-On NetApp SANtricity. NetApp provides reference architectures and Splunk applications to monitor the storage—delivering a more complete solution than traditional servers and internal storage.
Operational Intelligence, Getting from Data to Insights
An organization’s data is a definitive source of intelligence, because it is a categorical record of activity and behavior, including user transactions, customer behavior, machine behavior, security threats, and fraudulent activity. Splunk is operational intelligence software that enables you to monitor, report, and analyze live streaming and historical machine-generated data. Splunk helps users distill, sift, and understand this machine data to improve service levels, reduce IT operations costs, mitigate security risks, enable compliance, and create new product and service offerings.
Splunk universal indexing allows you to search and analyze all of your data, both real-time streams and historical archives. Splunk is scalable enough to work across all of your data centers, and it is powerful enough to deliver real-time dashboard views to any level of the organization. Splunk offers solutions for IT operations, applications management, security and compliance, business analytics, and industrial data. Splunk enables users to develop valuable insights into how to innovate and offer new services as well as into trends and customer behaviors.
Splunk Enterprise Deserves an Enterprise Storage Infrastructure
As the use of Splunk for operational intelligence grows from a pilot program to full deployment in your organization, its operational integrity becomes critical. Splunk deserves a storage infrastructure that will make sure of optimal and consistent performance at minimal maintenance and expense. The E-Series storage system provides improved performance, data availability, scalability, data protection, and single-interface storage management compared to Splunk workloads running on commodity servers with internal drives.
The EF560 all-flash storage solution combines robust, full-featured storage management software, a bullet-proof array chassis, and the most recent solid-state disk innovations to provide superior technological and business value. The NetApp EF560 and E-Series utilize the same chassis, which is used in thousands of installations that demand highperformance, dense, cost-effective storage. Together, these storage systems have a proven record of five nines reliability across millions of systems deployed.
Together, these building blocks are configured to support the Splunk hot, warm, cold, and frozen data tier model. Operations can effectively accelerate data indexing and searching with flash and minimize cost and space for colder data with high-capacity near-line SAS drives.
NetApp EF560 and E5660 deployed against Splunk data tiers.
For Splunk’s hot data tier, the NetApp EF560 delivers submillisecond access latency at 650K IOPS, greatly accelerating search performance. It scales to 192TB and can deliver 12GBps of throughput. Additionally, this exceptional performance is only negligibly affected during disk failures due to the implementation of Dynamic Disk Pools (DDP). DDP also performs much faster data recovery from disk failures, and there is no need to immediately replace failed disk drives.
NetApp E-Series storage systems also support advanced data security features, including media erasure and full hardware disk encryption. The SANtricity storage management system not only configures these features, but also manages encryption keys for each disk in the entire pool of NetApp storage systems.
Performance Superiority
For many operations, the most compelling reason to power your Splunk environment with NetApp storage is the performance advantage you will realize versus commodity servers with internal drives. Recent testing closely simulating real-world Splunk indexing and data searches showed conclusively that operations have much to gain from this storage approach. While indexing performance was similar between NetApp and internal drives, searching was significantly faster, on average 69% faster. Stream searches were more than twice as fast. And this is on average; recent NetApp installations have seen 12x search runtime improvements.
EF and E-Series performance versus commodity servers with internal drives.
Splunk Apps and Technology Add-Ons
The alliance partnership between NetApp and Splunk includes developing apps for the NetApp storage platform portfolio. These apps and add-ons make the NetApp storage part of the infrastructure landscape Splunk is surveying, allowing for better utilization of resources and supporting a more secure overall operation. Current platforms supported with Splunk apps include SANtricity (EF and E-Series and NetApp StorageGRID object stores) and the NetApp Data ONTAP operating system (both clustered and 7-Mode).
NetApp SANtricity performance app for Splunk Enterprise.
Conclusion
When performance, reliability, cost, and convenience are taken into account, NetApp storage proves its business value in Splunk implementations.