|
|
|
Stock market behavior is interestingly complex: settling at times into what seems like reliable rhythms, then jumping, falling, or bouncing off, all as millions of investors negotiate the trading of billions of shares of securities each day. Each participant’s actions change the landscape for others: prices and trading volume are in constant flux. And given today’s new information and communications technologies, market is not only changing more, but is changing far more rapidly than ever before. To develop diversified and robust trading strategies in quick time for different market scenarios, we have developed a strategy builder based on Artificial Intelligence. (fasdfasdfsadfASD)
The inspiration for the AI system in MarkeTopper securities was the IBM Deep Blue Chess Computer Project, the first computer to beat the human world chess champion, Garry Kasparov, in a historical match between man and machine. Deep Blue demonstrated that a sophisticated chess system could be developed using the IBM RS6000/SP parallel processor. After years of sluggish progress in artificial-intelligence research, Deep Blue's strong showing affirmed that computers were at least making rapid progress in certain intellectual tasks.
Can we make Machines Think? AI-Classic is our answer.
AI-Classic is a proprietary logic engine which provides the skeleton print for the development of robust trading strategies in acceptable amount of time. The Marketopper AI team has created a scalable parallel processing computing environment for designing and generating powerful trading portfolio for the financial markets worldwide. It involves creating and testing thousands of patterns and indicators based systems and sieving through for the best systems meeting our criteria. The technologies used in this system builder are Supervised Learning, Deterministic Annealing and Genetic Algorithm. We have mentioned the following paper on our AI project, but it should not be construed as the actual reflection of our work, instead just a brief idea on the flow of one of the project. Parallel Algorithm for Real Time Decision System for Financial Markets Dheeraj Bhardwaj, Manish Sansi* and Deepak Ajwani Department of Computer Science and Engineering Indian Institute of Technology, Delhi – 110016 INDIA * Marketopper Securities Pvt. Ltd, Delhi Abstract Computational models of real time decision systems for financial markets require vast amounts of computing resources, which are not available on today’s serial computer for real scenario. These problems show inherent parallelism by nature thus making parallel computing an essential tool. In the present article, we have developed a computational model and algorithms for solving in serial and parallel environments. The parallel algorithms have been implemented on a Linux cluster using the Message Passing Interface (MPI). An example from real financial market has been solved to highlight the commercial utility of these methods. The performance analysis shows that this class of problems is scalable for large number of processors. Keywords Rule based methods, patterns, trading system, parallel computing, MPI, relational matrix, linux cluster, scalability
Introduction Parallel computers, a mainstay of high-end scientific research are increasingly being looked upon to provide solutions in commercial research applications such as financial market research. There is a class of financial research problems that stretches the limits of current high performance computing systems [1]. Investment driven research in stocks and other financial instruments has pioneered development of new concepts and also has adopted concept from diverse fields into it folds, starting from pure economic value research to the bizarre phenomenon of sunspots [2]. In the early part of 20th century, technical analysis became a credible way of developing and exploiting an edge in trading. Unlike the fundamental view, where qualitative parameters related to an instrument become the basis for investment, technical analysis looks at an instrument as a commodity, for what it is rather than what it represents. Technical analysis attempts to identify the basic forces of supply and demand existing for this commodity and thereby makes a decision regarding trading or investing in the instrument. In the past, researchers have come up with elaborate trading systems that incorporate multiple indicators, money management techniques, mathematical studies and statistical studies [3]. The field as such is very popular and by rough estimates almost 33% of trading volumes at leading stock exchanges in the world are driven by technical forces. One of the methods of technical analysis relies on patterns. A pattern refers to a certain bar chart sequence that can be identified in the price history of a security [4]. Usually, this type of chart information has several past occurrences and is expected to reoccur in the future. Furthermore, whenever traders refer to a pattern they often mean a good pattern i.e. charts information that can be used to trade with a high probability of success. How are patterns formed? Although stock prices move mostly at random, they often cause the formation of distinct patterns. These patterns repeat in the future, as long as the same conditions that cause their origination repeat as well. Examples of conditions that may trigger the formation of patterns include technical analysis considerations, release of economic indicators, stock upgrades or downgrades, earning reports, etc. The patterns can be identified based on market observation or by using search algorithms. To successfully perform the later, computational limitations are a major hindrance. Computational requirements for such problems are enormous. A simple search algorithm with a very short market memory which mimics all real market attributes would need 0(10306) floating point operations. Since this is impossible to carry out on existing computing infrastructures, we apply current market knowledge to reduce the possible attributes and bring down the floating-point operation requirements to 75 Teraflop. This number of floating-point operations can be carried out in real time using parallel computers. This class of applications also shows inherent parallelism in various domains. Hence, parallel algorithms are the only answer to the challenges posed the exploding data available in financial markets. Problem Statement Before we trade any strategy in the real market we need to understand and anticipate the systems expected future performance and also the portfolio risk that we will be exposed to. There are many ways to do this: 1. Test the system on different time frames or time periods of the same data set 2. Test the system on different sets of data e.g. different sectors such as IT, healthcare etc. 3. Use simulators, which create artificial data by statistical analysis of past data. 4. Use simulators, which create artificial data by using random number based price increments. The ways 1 and 2 depend on past price movement and expose us only to the scenarios which have already occurred and may not play out in the same fashion again. The way 3 is dependent on the price data of the past and may not be dynamic enough. The way 4 is not reliable as prices are controlled by an external set of rules and are not dependent on continuity. In order to overcome drawbacks in above methods, we propose an approach, which takes the theory of reflexivity [5] by George Soros its basis: “Financial markets cannot possibly discount the future correctly because they do not merely discount the future; they help to shape it”. A pattern that is derived by past price behavior impacts the current prices. When an action is taken based on what happened in the past, the action sets in motion a new series of events which taken together with the present scenario creates a completely new picture. Thus, we map the historical price movement using patterns. A pattern occurs after an exact price alignment and impacts the future price movement. How do we find patterns? To arrive at useful patterns we have to sift through a very large number of possible combinations (this may be of the 0(10300)). The combinations are relationships between various data points. We generate these patterns by comparing numeric values of data points using mathematical relationship such as greater than, less than, equal to, cross over and cross under. As a proof of concept we want to study simple relationships, which are visible to the naked eye. Hence comparisons that can be graphically represented without any mathematical and statistical modeling have been used. Computational Model Input Data – The input data available to us is price series data for a given company’s stock price at a specific time period in the past. The data is formatted for a specified time interval. This time interval can vary from 1 minute to 1 year. One time interval contains all the data related to a trading activity that took place in that time interval. The fields in this data are: open price, high price, low price, close price, traded volume, date and time. Data Windows – The data windows comprise of seven contiguous time intervals (this is arbitrary the value can be any amount). We apply our mathematical operand over the data points confined within these windows. As we move through the data starting from the initial date and time, we discard the last interval in the window and add the next interval, which gives us the next window. Comparisons - We compare data point within the window to each other to create and evaluate a pattern. These comparisons follow certain rules such as the date field cannot be compared to any other field except date, all types of price fields in the window and be compared to each other but not to themselves Mathematical Operand - We propose to use five mathematical operand for comparisons viz., greater than, less than, equal to, cross over and cross under. Hybrid Data fields- In addition to the natural price point in an interval, we create derived price compatible values such as the midpoint of the interval, range etc. These are called hybrid data fields. If we use all five operands and seven data fields (open, high, low, close, midpoint, range, and volume) over one window, we get 5880 distinct combinations of two data points being compared to each other. These combinations also contain some redundant combinations such as comparing same data point to itself. After removing these redundant combinations, we are left with approximately 1000 unique useful combinations. Using this set of combinations we generate patterns. A pattern can contain from one combination to all 1000 combinations using the AND operator and can return value true or false for any window under evaluation. A pattern evaluation is as follows: If ((A > B). AND. (B > C). AND. (C > D)) output = TRUE For present study we have used 3 years data for 12 different stocks selected from the same sector e.g. the IT Sector. In one day we have six intervals and the total number of trading days in a year is approximately 265. Thus, the total numbers of windows for 3 years of data of one stock is 4770. The number of possible patterns is of 0(10300). The same possible patterns have to be evaluated for each window to extract the useful patterns. Hence, the total number is of pattern evaluations that we have to carry out are 4770x10300. On an average in each pattern we have to perform approximately 500 comparisons, 499 Boolean operations and one assignment. Each comparison/Boolean/assignment operation takes one flop [6]. Based on this total number of floating point operations that we need to perform to obtain a usable solution is 4770x10300x1000 5x10306 flop. We have to perform this numbers of operations for each company of the portfolio (there are 12 companies in our portfolio). In order to design this problem for a feasible solution we need to reduce the number of possible patterns without sacrificing the quality of output. We have observed that all 1000 possible combinations are not equally important; hence they can be ranked according to their perceived value. This helps us in selecting the best combinations from the combination set which can be solved. To solve this problem in real time, we select 30 top ranked conditions, which would require approximately 75 Teraflop. The huge computing requirement of this problem guides us to use parallel computers. The nature of the problem also has inherent parallelism in various domains such as patterns, stocks etc. Algorithms We first define the terms relational matrix and pattern performance for designing the serial and parallel algorithms. Relational Matrix – We create a Boolean matrix with columns representing comparisons and rows corresponding to the data windows. The elements of the matrix contain TRUE or FALSE resulting from comparisons performed on the windows. This relational matrix is used to generate patterns, which are to be evaluated [7]. Pattern Performance – Historical output achieved if trading action is taken on the basis of the pattern being TRUE at a given window (a) Serial Algorithm BEGIN Setup data structures, variables and constant Read the price series data for all stocks Initialize and calculate the hybrid data fields Read and interpret condition set FOR every window DO Evaluate condition and store the result in a relational matrix END FOR every condition_set DO Generate patterns FOR every pattern DO FOR every window DO Check pattern Occurrence IF (pattern present) Store pattern performance ENDIF END IF (pattern performance acceptable) Store pattern END END END (b) Parallel Algorithm BEGIN Setup data structures, variables and constant Read the price series data for all stocks Initialize and calculate the hybrid data fields Read and interpret condition set FOR every window DO Evaluate condition and store the result in a relational matrix END MASTER processor spawns the SLAVE processors MASTER processor Broadcasts the parameters to SLAVE Processors FORALL processors simultaneously DO FOR every local_condition_set DO Generate patterns FOR every pattern DO FOR every window DO Check pattern Occurrence IF (pattern present) Store pattern performance ENDIF END IF (pattern performance acceptable) { WRITE the selected pattern in file (Using parallel I/O approach) } END END END Parallel Implementation We have used the Message Passing Interface (MPI) [8] for parallel implementation of the algorithm. This is an ideal example of data parallelism. Since conditions set can be computed independently, parallelization has been done over the condition set. We distribute the condition set across the available number of processors. This application also has large and frequent I/O requirements, where processors have to write the selected patterns from a local condition set on a processor into a file as and when they find them. One way of writing a file could be that each processor sends back the selected pattern to the master and master writes in to a file. But, this is an inefficient approach, as it will have frequent communications resulting into a large communication overhead. We have used MPI-IO routines implemented in ROMIO library [9]. By using ROMIO functions we create a global file_set_view, which allows each processor to write into the global file in parallel. Thus, all the slave processors need not to send the selected pattern data back to the master. Hence, this technique improves the I/O and overall performance. Example We select 3 years of price data (stocks) of 12 companies from IT sector. We attempt to develop a simple rule based trading system. The pattern that will be generated in this process will be put to use for trading and also to generate the virtual market. We will use a set of rules pertaining to pattern performance to capture patterns that are tradable,
and once patterns are identified, to formulate a set of rules that empirically trades well. Pattern Performance decided on the following output criteria
Minimum trades in a pattern = 50 Minimum profit per trade = 1 % (After costs) Minimum winning percentage = 60% We generate a condition set that contains all possible combination of two distinct data points using one of the five mathematical operands. We rank this condition set to isolate the top 30 conditions and then use this smaller version of the condition set for our algorithm. Using the relational matrix, we generate and simultaneously evaluate all possible patterns made by combining the conditions. When any window returns the pattern occurrence as TRUE, we record the trading result on that window. To calculate the trading result, we subtract the open price from the close price and if positive value is obtained, we record it as profit and increment all counters related to profitable trades. Patterns, which meet the performance criteria, are selected and become a part of the pattern library. Results
|
Name of the company
|
Number of Trades
|
Net Profit(INR)
|
% Return*
|
|
Digital Equipment
|
132
|
84939.33
|
84.94%
|
|
Global Tele
|
129
|
138306.3
|
138.31%
|
|
Himachal
|
115
|
124896.7
|
124.90%
|
|
Mastek
|
109
|
64398.67
|
64.40%
|
|
Infosys
|
138
|
26730.67
|
26.73%
|
|
NIIT
|
113
|
93011.33
|
93.01%
|
|
Polaris
|
110
|
93747
|
93.75%
|
|
Rolta
|
138
|
77726.67
|
77.73%
|
|
Satyam
|
146
|
135721.7
|
135.72%
|
|
Softsol
|
115
|
84720
|
84.72%
|
|
Wipro
|
137
|
66234.33
|
66.23%
|
|
Zee
|
116
|
75616.33
|
75.62%
|
|
|
1495
|
1066049
|
75.62%
|
* Exposure per scrip INR 100,000, Portfolio exposure INR 1,200,000 Table 1: Profits generated by using the output of the algorithm. The table 1 shows the returns by trading the patterns generated using the present algorithm on a portfolio which contains 12 IT companies. The above results are for a one-year period starting from June 2002. We generated the patterns by using three-year data starting from Aug 1999. The patterns that met our performance criteria were used for taking actual positional trades in the stock market. The above table shows the returns generated by this trading. These results are net of costs and actual money that the firm earned in a real situation.
Performance Analysis We have used an Intel Pentium-IV based Linux cluster for parallel implementation. The cluster nodes are 2.2 GHz with 1 GB of RAM each connected by a 100 Mbps fast Ethernet network. We have used the mpich implementation of the Message Passing Interface (MPI) for parallel implementation. For this experiment we selected a portfolio of 12 IT companies, and generated 676 condition sets for the patterns. We distribute the 676 conditions over the number of processors participating in the parallel computation.
Figure 1 shows that by increasing the number of processors, execution time decreases linearly. We have also observed that during the execution of the parallel code, the CPU usage of each node remains between 96 – 99 %.
Figure 1: Number of processors vs. Execution time bar chart
|
|
Discussion and Conclusion Financial decision making in real time demands parallel processing of vast amount of information. In order to solve these problems with huge computing requirements, present serial computers are just not sufficient. These problems show inherent parallelism by nature. Thus, making them right candidates for parallel computing. Performance analysis proves that this class of problems is highly scalable for large number of processors. As witnessed from the example, discussed above, it is also clear that the solutions generated are of immense commercial value. Acknowledgement The authors wish to thank Marketopper Securities Pvt. Ltd, New Delhi for providing support and data for carrying out this research. The authors also wish to thank Department of CSE, IIT Delhi for providing computational support. References 1. Jeffrey Owen Katz, Donna L. McCormick, Developing systems with a rule-based approach, Stocks & Commodities, Vol 15, 1997, pp. 26-34. 2. Jeffrey Owen Katz, Donna L. McCormick, Sunspots and market activity, Stocks & Commodities, Vol 15, 1997, pp. 401 - 406. 3. Perry J. Kaufman, Trading and System and Methods, John Wiley & Sons, 1998. 4. Michael Harris, Stock Trading Techniques, Trades Press Inc, USA, 2000. 5. George Soros, Alchemy of Finance, Simon & Schuster Inc, 1987. 6. Kai Hwang, Zhiwei Xu, Scalable Parallel Computing, Mc Graw Hill, 1998 7. W. Gropp, E. Lusk, and A. Skjellum, Using MPI: Portable Parallel Programming with the Message-Passing, MIT Press, 1994 8. Thakur, R., Lusk, E., and Gropp, W., 1998, User Guide for ROMIO: A High Performance, Portable MPI-IO Implementation, TM No. 234, ANL, IL 60439 (USA).
|
|
|
| |
| |
|
|
| |
| |
|
|
|
|
 |
|