GetHFData: a R package for downloading and aggregating high frequency trading data from Bovespa.

AutorPerlin, Marcelo S.
CargoEnsayo
  1. Introduction

    Financial research has shift its profile over the years. Such a change is due to the increase of the frequency of financial data from which the researchers make their analysis. The recent regulatory and technological changes seen in the financial markets, along with the availability of trade and quote data and the increase of computer power to domestic users motivated scholars to study financial market dynamics in a finer scale, based on trading data in the high, tick by tick, frequency.

    These studies are related to the topic of market microstructure, an upcoming field of expertise first established by the works of Kyle (1985), Glosten and Milgrom (1985) and Easley and O'hara (1987). These papers were novel at the time by showing that the price of an asset is related to existing frictions of the underlying market. The field of market microstructure specializes in studying the way that market frictions originating from the underlying structure can affect the price formation process of the traded assets. The main objective is to investigate and promote a well functioning market with high liquidity, low asymmetry of information and a better well being of market participants (De Jong and Rindi, 2009, Hasbrouck, 2007, Madhavan, 2000).

    As mentioned before, empirical studies in the topic of market microstructure usually requires the analysis of high frequency trading data from the financial exchanges. These are related to large databases that are hard to store and process. While we can find previous work that discuss issues with this dataset such as Goodhart and O'Hara (1997) and Brownlees and Gallo (2006), they are mostly related to problems with the data itself such as effects of the market structure on the availability and interpretation of the data than the actual computational problems related to dealing with this large dataset in domestic computers.

    The size of the dataset for a significant time window is a burden for an unexperienced researcher. As an example, the trading records for the date 2015-11-03 in Bovespa's equity market, a small market in international terms, is stored as a single compacted zip file with approximately 30 MB. When unpacked, the result is a text file with 310 MB of content. When more days are included, the size of the resulting data becomes troublesome, even for high end computers. For researchers with no background in programming, using this dataset requires a significant cost of time in order to learn how to store and process the related files in an efficient way. The complexity of this operation clearly creates a barrier that holds back the development of this research area in Brazil.

    The objective of this paper is to introduce the functionalities of a software created to facilitate the importation and analysis of high frequency trading data from Bovespa. The program developed in the R platform is publicly available as a CRAN package. It allows the direct access to trade data for equity and derivative contracts from the exchange. The package contributes to the literature by facilitating the access to this particular set of data, decreasing the computational burden of users. The proposed package has the potential of setting a standard for accessing and manipulating this rich dataset by consolidating an accessible framework. The popularization of this software can increase the number and reproducibility of studies in this important area of finance.

    The paper is organized as follows, first we make a brief review on the topic by discussing the main studies that have used high frequency data from Bovespa. We continue by presenting a brief introduction to the structure of the Brazilian financial market. Next, the format of the package and its main functionalities are presented. The work is followed by two empirical applications using the package. The paper finishes with the usual concluding remarks.

  2. Literature Review

    Previous studies using high frequency data from Bovespa approached different issues in the financial literature. The majority of these studies analyzed the volatility of the Brazilian stock market. Santos and Ziegelmann (2014) reproduced three different measures (realized variance, realized power variation and realized bipower variation) in a 15-minute interval sample ranging from 2004 to 2009. These measures were included as regressors in MIDAS (mixed data sampling) and HAR (heterogeneous autoregressive) models with the purpose of forecasting realized variance. The authors find that measures which are robust to jumps in asset prices such as realized power variation and realized bipower variation provided better forecasts for future volatility in terms of mean squared error. However, these predictions are not reported to be statistically different from those forecasts based on realized variance.

    In a similar approach Araujo and Avila (2015) estimated MIDAS and HAR models using high frequency data within a 5-minute interval. The total sample period starts in 2000 and ends in 2014. According to the authors, the HAR model resulted in better in-sample forecasts while a simple combination between the two models produced smaller values for both out-of-sample errors. Nonetheless, Junior and Pereira (2011) did not find statistical difference between the out-of-sample forecasts of the MIDAS and the HAR models.

    Apart from these studies regarding the comparison of prediction performance of realized volatility estimators, one can find other studies using intraday trading data. Applications ranges from perceived volatility analysis during different intraday periods (Vicente et al., 2014), GARCH-family modeling and forecasting (Moreira and Lemgruber, 2004, Cappa and Pereira, 2009, Carvalho et al., 2006, Val et al., 2014, Garcia et al., 2014, Ceretta et al., 2011) to the classical problem of minimum variance portfolio selection (Borges et al., 2015).

    Another strand of this literature is related to the analysis of trading strategies based on high frequency data. Fonseca et al. (2012) assessed abnormal returns resulting from different strategies implemented in the time period between 2006 and 2009. Considering the lead-lag effect between Ibovespa spot and future prices, the authors tested strategies built on time series forecasts estimated by ARIMA, ARFIMA, VAR and VEC models. However, the results showed that a buy-and-hold strategy resulted in better performance than the other two market timing trading rules. The authors further split the sample in two subperiods, one before and one after the 2008 crisis and the results remained robust.

    Pontuschka and Perlin (2015) tested the efficiency of a pairs trading strategy in a multi-frequency approach. Using data sampled from different frequencies (1,5,15,30, 60 minutes and daily data) in the 2008-2011 period, the authors provided evidence in favor of their hypothesis: higher sample frequency results in higher performance of the strategy and, therefore, higher evidence of market inefficiency. In a different approach, Jabbur et al. (2014) explored trading strategies based on technical analysis algorithms using five stocks during the month of October 2013. Other applications related to strategies includes the use of volatility estimators in a timing approach (Garcia et al., 2014) and macroeconomic variables (Garcia et al., 2016).

    Although most studies using intraday data addressed risk and return topics, this type of data has also been used in the analysis of other subjects such as market liquidity and its commonalities (Victor et al., 2013, Silveira et al., 2014, Casarin, 2011, Marquezin and De Mattos, 2014, Perlin, 2013), bid-ask spreads/order book analysis (Cajueiro and Tabak, 2007, Maluf and Otiniano, 2014, Araujo et al., 2014) , asymmetric information and corporate governance (Barbedo et al., 2007, Neto et al., 2012, Martins and Paulo, 2014), computation and algorithm programming (Silva et al., 2014, Araujo et al., 2015) , high frequency data distribution (Horta and Ziegelmann, 2011, Cortines and Riera, 2007, Block et al., 2015) and other research topics in Finance (Taufemback and Da Silva, 2011, Caetano and Yoneyama, 2007, Biage et al., 2010, Perlin et al., 2014).

    Table 1 summarizes recent high frequency data studies in Brazil. An arbitrary classification was made based on the main subject of the articles. According to the previous description, most of the work focused on volatility, returns and trading strategies. Another remark from Table 1 can be made about the sample period of these studies: some papers analyzed only one trading day (Caetano and Yoneyama, 2007, Maluf and Otiniano, 2014) while studies such as Araujo and Avila (2015) used fourteen years of intraday data. This variability in sample size is expected, since different research objectives can demand more or less data. As for the case of the use of a small trading period, a possible explanation is the computational expertise needed in handling this type of database for larger periods. We also point out that most of the previous studies using high frequency data are related to the equity market. We have found a very low number of studies for derivative contracts such as options and futures. We hope that the use of GetHFData increases the number of studies for other markets.

  3. The Brazilian financial market

    Until 2008, the Brazilian financial market was concentrated in two main exchanges: Bovespa (SAlo Paulo Stock Exchange), which traded mainly equity contracts, and BM&F (Brazilian Mercantile and Futures Exchange), which negotiated commodities, futures and other derivatives. In 8 May 2008, Bovespa and BM&F merged on BM&FBovespa (1), creating one of the largest exchanges in Latin America in terms of market capitalization. According to BM&FBovespa's website, at the end of June/2016 there were 353 companies listed in the stock market. BMF&Bovespa's 2015 Annual Report shows an average daily trading value of R$ 6.7 bn on equities and equities derivatives...

Para continuar a ler

PEÇA SUA AVALIAÇÃO

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT