The rate of data growth in financial markets has scaled beyond the means of manageability.
Debates have gone so far as to dismiss Big Data as being tamable and controlable in the near term with the current computing architecture commonly adopted as an acceptable solution. I agree but argue that conventional data transport – not management – is the real challenge of handling and utilizing Big Data effectively.
From exchange to trading machine, the amount of new ticks and market data depth are delivered only as fast as the delivery speed can endure. Common market data feeds that are used in conventional exchange trading are but a fraction of the market information actually available.
Perhaps due to high costs of $100,000 per terabyte, many market participants deem the use of more data as a bit too aggressive. Or they believe that high performance computing (HPC) is the next generation technology solution for any Big Data issue. Firms, therefore, are sluggishly advancing their information technology in a slow cadence in tune with the old adage: “if it ain’t broke don’t fix it.”
Over the last decade, Wall Street business heads have agreed with engineers that the immense perplexity of Big Data is best categorized by Doug Laney’s 2001 META Group report’s Three B’s: Big Volume, Big Velocity and Big Variety.
When looking at “Big Volume” 10 years ago, the markets had just defragmented under Regulation ATS. A flurry of new market centers arose in U.S. equities as did dark liquidity pools. This gave rise to a global “electronic trading reformation.” Straight-through processing (STP) advocates and evangelized platforms such as BRASS, REDIPlus and Bloomberg Order Management Systems (OMS) resulted in voluminous and fragmented market data streaming to 5,000 NASD/FINRA trading firms and 700,000 professional traders.
Today, the U.S. has 30+ Securities and Exchange Commission-recognized self-regulatory organizations (SROs), commonly known as exchanges and ECNs. For the first time since 2002, full market depth feeds from NASDAQ allow firms to collect, cache, react, store and retrieve feeds on six hours of trading for nearly 300 days a year more transparently than ever. Big Data volume has grown 1,000 percent and has reached three terabytes of market data depth per day.
Billions of dollars are being spent on increasing “Big Velocity.” The pipes that wire exchanges through the STP chain to the trader have become 100 times faster and larger but still not fast enough to funnel the bulk of information laying idle back in the database. Through “proximity hosting,” the telco is eliminated and latency is lowered. This structure results in adjustments made for larger packets but not really for more information as Big Data remains the big, quiet elephant in the corner.
Five years after Reg ATS, markets are bursting at the seams with electronic trading that produces explosive market data that breaks new peak levels seemingly every day. The SEC’s Regulation National Market System (Reg NMS), struck in 2007, requires exchanges and firms to calculate the best price for execution to be compliant. Firms are also now mandated to sweep all exchanges’ market order books and process all of that data for a smart execution.
After the execution, traders have to track the “order trail” from price to execution for every trade and store all of that information for seven years in the event of an audit recall of a transaction.
Under Reg NMS, subscribing to the full depth of all 30+ markets in “real time” would mean a firm would have to have a 1x terabyte pipe for low latency. Since a T-pipe is not realistic, data moves at 1x gigabits, which is relatively slow with the data in queue at 50-100 terabytes deep. Multi-gbs pipes, as fast as they seem, are still similar to driving five miles an hour on a 55 mph highway.
Analysts typically call data from a database with R (Revolution Analytics) and “SAS” Connectors. The process includes bringing data to an analytical environment in which the user runs models and computations on the subsets of a larger store before moving on to the next data crunch job. The R and SAS Connectors between the file servers and the database are at 10/100BASE-T, making the movement of 50 terabyte environment like driving one mile per hour in a 55 mph zone.
We all hear the polemics regarding data formats and the jigsaw puzzle of unstructured data and the fact that “Big Variety” is the obstacle. Even after standardization of SQL-based queries where analysts can ask any “ad hoc” question, too many sources and too many pipes from analytic servers cause traffic jams. SQL databases are ideal for unstructured queries but are slow in unstructured data compiling. Aggregating market information is where much of market’s processing technologies are being evaluated today to meet the requirements of regulations, sweeping for best execution and for risk management.
Comparing where current prices of stocks are against bids and asks to trade across multiple exchanges, markets, sources, asset classes and clients is essentially the Big Data task of risk management. In addition to managing data changes, firms are also tasked with managing their trading accounts, client portfolios and trading limits such as with the implementation of Credit Valuation Adjustments (CVAs) for counterparty risk.
So why are we still piping data around the enterprise when we just need more compute and memory power? Hardware-accelerated core processing in databases such as XtremeData’s dbX and IBM’s Netezza are powered by FPGAs (field programmable gate arrays). Processing of massive amounts of data with FPGAs can now occur at “wireless” speed. Along with high performance computing, high-speed messaging technology provided by companies like TIBCO, Solace Systems and Informatica have redefined transport times into ultra-low latency terms from one database to another in single microseconds, sometimes in nanoseconds, from memory-cache to memory-cache.
The colloquial phrase “in-database” analytics is an approach of running analytics and computations as near as possible inside a database where the data is located. Fuzzy Logix, an algorithmic HPC vendor, replaces the need for SAS and R connecting analytics, which stretch along the wire from the database to the analyst. With Fuzzy Logix, the need to call a database for small files is eliminated because computations can be done with the rest of the database in real-time: days to seconds faster.
With in-database or in-memory analytics, BI engineers can eliminate transport latency altogether and now compute at server speeds with computations sitting inside the database or in memory for tasks to be completed locally, not on the transport wire.
Wall Street is as risk averse as ever in today’s atmosphere so the adoption of new technology or new vendors continues to present operational risk challenges. ParAccel is a company that appears to be addressing the operational risk of new technology adoption by helping firms utilize the power of parallel processing of Big Data analytics on OEM hardware.
Since ParAccel is software, an IBM, HP or Dell shop could essentially rely on the reliability of their well-known, established database vendor but use next generation Big Data analytic processing an order of magnitude faster than what is currently in place. ParAccel allows firms to aggregate, load and assimilate different data sets faster than traditional platforms through its “columnar database” nodal system. The columns in a ParAccel environment provides firms with the flexibility to first run analytics in-database or in-memory, then bring massive amounts of data to a common plane and finally, aggregate the unstructured data and do it all in lightning speed.
Other companies like NVIDIA have been building graphic processing units (GPUs) for the video game industry for three decades and are now swamped with customer requests to help build parallel computing environments, giving financial firms the ability to run trillions of algorithmic simulations in microseconds for less than $10,000 per card, essentially. GPUs can have up to 2,000 cores of processing on a single NVIDIA Tesla card embedded inside. A GPU appliance can be attached to a data warehouse for advanced complex computations. Low-latency processing can also be achieved due to minimum movement of data over a short distance analyzing most of what Wall Street claims is Big Data in seconds compared with the days it takes now.
The vendors and players are ready to get to work; there just needs to be some consensus that the Big Elephant in the room is there and it’s standing on a straw when it could be surfing a Big Wave!
Source: Tabb Forum , 02.05.2012 by Dan Watkins, President @ CC- Speed firstname.lastname@example.org