Amazon Cloud Services and Investment Portfolio Analysis
3r31220. 3r3-31. Recently, there has been high volatility in stock markets when, for example, a stable paper of a well-known company can lose several percent at once on the news of sanctions against its leadership or, on the contrary, fly to the skies on a positive report and investors' expectations about super-profit dividends.
3r31220.
3r31220. How to determine whether the ownership of this security brought in income or only loss and disappointment?
3r31220.
3r31220. 3r31181. 3r311. 3r31216. (Source) 3r3–31209.
3r31220.
3r31220. In this article I will tell you how to determine and visualize the adjusted financial result for securities.
3r31220.
3r31220. Using the example of client reporting Opening Broker, we consider parsing and consolidating brokerage reports for the stock market, building a cloud reporting system architecture followed by simple and convenient analysis in AWS Quicksight.
3r31220.
3r31220. 3r31188. Description of the problem
3r31220. Many trainings and training lessons tell us about the need to maintain a trading journal, which records all the parameters of the transaction for further analysis and summarizing the work of the trading strategy. I agree that such an approach to work on the Exchange allows you to discipline a trader, to increase his awareness, but it can also be great to tire out the tedious process.
3r31220.
3r31220. I admit, at first I tried carefully to follow the advice of keeping a journal, meticulously wrote down every transaction with its parameters in an Excel spreadsheet, built some reports, summary diagrams, planned future deals, but I quickly got tired of it all.
3r31220.
3r31220. 3r31163. 3r3r1164. Why keeping a trader's manual journal is inconvenient? 3r31165. 3r31166. 3r33937. 3r31220. 3r33967. manual filling of the journal (even with the use of partial automation, in the form of unloading transactions from the trading terminal for the day) quickly tires; 3r33975. 3r31220. 3r33967. there is a high risk of errors or typographical errors during manual input; 3r33975. 3r31220. 3r33967. it may happen that the active trader becomes a passive investor and he returns to this magazine less and less, and then forgets about him (my case); well, finally,
3r31220. 3r33967. we can program, why not take advantage of this and not automate the whole process? So, let's go! 3r33975. 3r31220.
3r31216. 3r31216.
3r31220. Often, brokerage companies are high-tech organizations that provide their clients with fairly high-quality analytics on virtually all issues of interest. It is fair to say that with each update this reporting gets better and better, but even the most advanced of them may not have the customization and consolidation that demanding and inquisitive customers want to see.
3r31220.
3r31220. For example, Opening Broker allows you to receive brokerage reports in XML format in your account, but if you have an IIS and a normal brokerage account on the Moscow Stock Exchange (MOEX), these will be two different reports, and if you still have an account on St. Petersburg Petersburg Stock Exchange (SPB), then one more will be added to the first two.
3r31220.
3r31220. So, to get a consolidated investor log, you will need to process three files in XML format.
3r31220.
3r31220. The aforementioned reports on MOEX and SPB differ slightly in their formats, which will need to be taken into account in the implementation of the data mapping.
3r31220.
3r31220. 3r31188. The architecture of the system being developed 3r3119.
3r31220. The diagram below shows the architecture model of the system being developed: 3r3-31206. 3r31220.
3r31220. 3r31181. 3r31220. 3r31188. Implementing the parser
3r31220. We will receive reports on all three accounts for the maximum possible period in your Personal Account (can be divided into several reports for each year), save them in XML format and put them in one folder. As a test data for the study, we will use a fictional client portfolio, but with the most approximate parameters to market realities.
3r31220.
3r31220. 3r31181. 3r3102. 3r31216.
3r31220. Suppose that the investor Mr. X we are considering has a small portfolio of five papers: 3r3-31206. 3r31220.
3r31220. 3r3758. 3r31220. 3r33967. The report on the stock exchange SPB will be two papers: Apple and Microsoft; 3r33975. 3r31220. 3r33967. In the report on the stock exchange MOEX (brokerage) one paper: FGC UES; 3r33975. 3r31220. 3r33967. The report on the stock exchange MOEX (IIS) two papers: CMI and OFZ 24019; 3r33975. 3r31220.
3r31220. Under our five securities, there may be buy /sell transactions, dividend and coupon payments, prices may change, etc. We want to see the current situation, namely: the financial result, taking into account all payments, transactions and current market value.
3r31220.
3r31220. And here comes the Python, read in one array of information from all reports:
3r31220.
3r31220.
3r3828. my_files_list =[join('Data/', f) for f in listdir('Data/') if isfile(join('Data/', f))]3r31220. my_xml_data =[]3r31220. 3r31220. # Read reports from the
directory. for f in my_files_list:
tree = ET.parse (f)
root = tree.getroot ()
my_xml_data.append (root) 3r31028.
3r31220. 3r?13?128. For analytics, from reports we will need several entities, namely: 3r3-31206. 3r31220.
3r31220. 3r33937. 3r31220. 3r33967. Positions of securities in the portfolio; 3r33975. 3r31220. 3r33967. Concluded transactions; 3r33975. 3r31220. 3r33967. Non-trading operations and other movements on the account; 3r33975. 3r31220. 3r33967. Average prices of open positions
3r31220.
In order to prepare the sample, we will use four dictionaries to describe the above sets.
3r31220.
3r31220.
3r3828. dict_stocks = {'stock_name':[], 'account':[], 'currency':[], 'current_cost':[], 'current_cost_rub':[], 'saldo':[]}
dict_deals = {'stock_name':[], 'account':[], 'date_oper':[], 'type_oper':[], 'quantity':[], 'price':[], 'currency':[], 'brokerage':[], 'result':[]}
dict_flows = {'stock_name':[], 'account':[], 'date_oper':[], 'type_oper':[], 'result':[], 'currency':[]}
dict_avg_price = {'stock_name':[], 'account':[], 'avg_open_price':[]} 3r31028.
3r31220. A few words about what these dictionaries are.
3r31220.
3r31220. 3r31163. 3r3r1164. Dictionary dict_stocks [/b] 3r31166. The dict_stocks dictionary is required to store general information on the portfolio:
3r31220.
3r31220. 3r33937. 3r31220. 3r33967. Paper name (stock_name); 3r33975. 3r31220. 3r33967. Account name (SPB, MOEX BROK, MOEX IIS) (account); 3r33975. 3r31220. 3r33967. The currency used to settle this paper (currency); 3r33975. 3r31220. 3r33967. Current value (at the time of the report in the Personal Account Opening Broker) (current_cost). Here I want to note that for demanding customers, in the future, you can make additional refinement and use the dynamic receipt of a security quotation from the trading terminal or from the website of the corresponding stock exchange; 3r33975. 3r31220. 3r33967. The current value of the position of the security at the time of the report formation (current_cost_rub) 3r3-31206. 3r31220. Similar to the above item, here you can also receive the current rate of the Central Bank or the exchange rate, as you like. 3r33975. 3r31220. 3r33967. The current balance of securities (saldo)
3r31220.
3r31216. 3r31216.
3r31220. 3r31163. 3r3r1164. Dictionary dict_deals [/b] 3r31166. The dict_deals dictionary is required to store the following information on completed transactions:
3r31220.
3r31220. 3r33937. 3r31220. 3r33967. Paper name (stock_name); 3r33975. 3r31220. 3r33967. Account name (SPB, MOEX BROK, MOEX IIS) (account); 3r33975. 3r31220. 3r33967. The date of the transaction, i.e. T0 (date_oper); 3r33975. 3r31220. 3r33967. Type of operation (type_oper); 3r33975. 3r31220. 3r33967. The volume of securities involved in the transaction (quantity); 3r33975. 3r31220. 3r33967. The price at which the transaction was executed (price); 3r33975. 3r31220. 3r33967. The currency in which the operation was performed (currency); 3r33975. 3r31220. 3r33967. Brokerage commission for the transaction (brokerage); 3r33975. 3r31220. 3r33967. The financial result of the transaction (result)
3r31220.
3r31216. 3r31216.
3r31220. 3r31163. 3r3r1164. Dictionary dict_flows [/b] 3r31166. The dict_flows dictionary reflects the movement of funds in a client account and is used to store the following information: 3r3-31206. 3r31220.
3r31220. 3r33937. 3r31220. 3r33967. Paper name (stock_name); 3r33975. 3r31220. 3r33967. Account name (SPB, MOEX BROK, MOEX IIS) (account); 3r33975. 3r31220. 3r33967. The date of the transaction, i.e. T0 (date_oper); 3r33975. 3r31220. 3r33967. Type of operation (type_oper). It can take several values: div, NKD, tax; 3r33975. 3r31220. 3r33967. The currency in which the operation was performed (currency); 3r33975. 3r31220. 3r33967. The financial result of the operation (result)
3r31220.
3r31216. 3r31216.
3r31220. 3r31163. 3r3r1164. Dictionary dict_avg_price [/b] 3r31166. The dict_avg_price dictionary is required to account for information on the average purchase price for each paper:
3r31220.
3r31220. 3r33937. 3r31220. 3r33967. Paper name (stock_name); 3r33975. 3r31220. 3r33967. Account name (SPB, MOEX BROK, MOEX IIS) (account); 3r33975. 3r31220. 3r33967. The average price of an open position (avg_open_price) is
3r31220.
3r31216. 3r31216.
3r31220. We process an array of XML documents and fill these dictionaries with the appropriate dаta: 3r3-31206. 3r31220.
3r31220.
3r3828. # Collect data from the relevant parts of the report 3r31220. for XMLdata in my_xml_dаta:
# Information about the Stock Exchange and account
exchange_name = 'SPB' if XMLdata.get ('board_list') == 'FB SPB' else 'MOEX'
client_code = XMLdata.get ('client_code')
account_name = get_account_name (exchange_name, client_code)
3r31220. #Maping tags
current_position, deals, flows, stock_name,
saldo, ticketdate, price, brokerage,
operationdate, currency,
current_cost, current_cost_rub,
stock_name_deal, payment_currency, currency_flows = get_allias (exchange_name)
3r31220. # Information about the state of the client portfolio
get_briefcase (XMLdata)
df_stocks = pd.DataFrame (dict_stocks)
df_stocks.set_index ("stock_name", drop = False, inplace = True) 3r31220. 3r31220. # Information about transactions 3r31220. get_deals (XMLdata)
df_deals = pd.DataFrame (dict_deals)
df_avg = pd.DataFrame (dict_avg_price)
3r31220. # Information about non-trading operations on the account 3r31220. get_nontrade_operation (XMLdata)
df_flows = pd.DataFrame (dict_flows) 3r31028.
3r31220. All processing is looped through all the XML data from the reports. Information about the trading platform, client code is the same in all reports, so you can safely extract it from the same tags without using mapping.
3r31220.
3r31220. But then you have to apply a special construction that will provide the necessary alias for the tag based on the report (SPB or MOEX), since the same data in these reports are called differently.
3r31220.
3r31220. 3r31163. 3r3r1164. Discrepancies on tags [/b] 3r31166. 3r33937. 3r31220. 3r33967. The broker's transaction commission in the SBP report is on the 3-333711 tag. brokerage [/b] , and in the MOEX report - broker_commission ; 3r33975. 3r31220. 3r33967. The date of the non-trading account transaction in the SPB report is 3r3-33711. operationdate [/b] , and in MOEX - operation_date etc. 3r33975. 3r31220.
3r31216. 3r31216.
3r31220. 3r31163. 3r3r1164. An example of mapping tags [/b] 3r31166.
3r3828. tags_mapping = {
'SPB': {
'current_position': 'briefcase_position',
'deals': 'closed_deal',
'flows': 'nontrade_money_operation',
3r31220. 'stock_name_deal': 'issuername',
'paymentcurrency': 'paymentcurrency',
'currency_flows': 'currencycode'
}, 3r31220. 'MOEX': {
'current_position': 'spot_assets',
'deals': 'spot_main_deals_conclusion',
'flows': 'spot_non_trade_money_operations',
3r31220. 'stock_name_deal': 'security_name',
'paymentcurrency': 'price_currency_code',
'currency_flows': 'currency_code'
}
}
3r3-1027. 3r31028. 3r31216. 3r31216.
3r31220. The get_allias function returns the mostThe name of the necessary tag for processing, taking as input the name of the trading platform:
3r31220.
3r31220. 3r31163. 3r3r1164. The get_allias function [/b] 3r31166.
3r3828. def get_allias (exchange_name):
return (
tags_mapping[exchange_name] ['current_position'],
tags_mapping[exchange_name] ['deals'],
tags_mapping[exchange_name] ['flows'],
tags_mapping[exchange_name] ['stock_name_deal'],
tags_mapping[exchange_name] ['paymentcurrency'],
tags_mapping[exchange_name] ['currency_flows']
)
3r3-1027. 3r31028. 3r31216. 3r31216.
3r31220. The function get_briefcase is responsible for processing information about the state of the client portfolio:
3r31220.
3r31220. 3r31163. 3r3r1164. Function get_briefcase [/b] 3r31166.
3r3828. def get_briefcase (XMLdata):
3r31220. # In the report FB SPB the portfolio is under the briefcase_position tag 3r31220. Briefcase_position = XMLdata.find (current_position)
if not briefcase_position:
return
3r31220. try: 3r31220. for child in briefcase_position:
stock_name_reduce = child.get (stock_name) .upper ()
stock_name_reduce = re.sub ('[,.]| (s? INC) | (s + $) | ([-s]? JSC)', '', stock_name_reduce) 3r31220. 3r31220. dict_stocks['stock_name'].append (stock_name_reduce)
dict_stocks['account'].append (account_name)
dict_stocks['currency'].append (child.get (currency)) 3r31220. dict_stocks['current_cost'].append (float (child.get (current_cost))) 3r31220. dict_stocks['current_cost_rub'].append (float (child.get (current_cost_rub)))
dict_stocks['saldo'].append (float (child.get (saldo))) 3r31220. 3r31220. except Exception as e: 3r31220. print ('get_briefcase -> Oops! It seems we have a BUG!', e) 3r31028. 3r31216. 3r31216.
3r31220. Further, with the help of the get_deals function, information about transactions is extracted:
3r31220.
3r31220. 3r31163. 3r3r1164. Function get_deals [/b] 3r31166.
3r3828. def get_deals (XMLdata):
3r31220. stock_name_proc = ''
3r31220. closed_deal = XMLdata.find (deals)
if not closed_deal:
return
# SPB report has a different sorting - only by the date of the transaction, 3r31220. # while MOEX reports: by paper, and then by deal date 3r31220. # Sort deals by paper:
if exchange_name == 'SPB':
sortchildrenby (closed_deal, stock_name_deal)
for child in closed_deal:
sortchildrenby (child, stock_name_deal)
try: 3r31220. for child in closed_deal:
stock_name_reduce = child.get (stock_name_deal) .upper ()
stock_name_reduce = re.sub ('[,.]| (s? INC) | (s + $) | ([-s]? JSC)', '', stock_name_reduce) 3r31220. 3r31220. dict_deals['stock_name'].append (stock_name_reduce)
dict_deals['account'].append (account_name)
dict_deals['date_oper'].append (to_dt (child.get (ticketdate)). strftime ('% Y-% m-% d')) 3r31220. 3r31220. current_cost = get_current_cost (stock_name_reduce)
3r31220. # In the SPB report, one tag per quantity - quantity, 3r31220. # there are two on MOEX: buy_qnty and sell_qnty
if exchange_name == 'MOEX':
if child.get ('buy_qnty'):
quantity = float (child.get ('buy_qnty'))
else:
quantity = - float (child.get ('sell_qnty'))
else:
quantity = float (child.get ('quantity'))
3r31220. dict_deals['quantity'].append (quantity)
dict_deals['price'].append (float (child.get ('price'))) 3r31220. dict_deals['type_oper'].append ('deal')
dict_deals['currency'].append (child.get (payment_currency)) 3r31220. 3r31220. brok_comm = child.get (brokerage)
if brok_comm is None:
brok_comm = 0
else:
brok_comm = float (brok_comm) 3r31220. dict_deals['brokerage'].append (float (brok_comm)) 3r31220. 3r31220. # Yield on each transaction and the average price of the position
if stock_name_proc! = stock_name_reduce:
3r31220. if stock_name_proc! = '':
put_avr_price_in_df (account_name, stock_name_proc,
pnl.m_net_position, pnl.m_avg_open_price)
3r31220. current_cost = get_current_cost (stock_name_proc)
pnl.update_by_marketdata (current_cost)
if len (dict_deals['result'])> 0:
if exchange_name! = 'SPB':
dict_deals['result'] [-1]= pnl.m_unrealized_pnl * ??? -dict_deals['brokerage'] [-2]3r31220. else:
dict_deals['result'] [-1]= pnl.m_unrealized_pnl - dict_deals['brokerage'] [-2]3r31220. 3r31220. stock_name_proc = stock_name_reduce
pnl = PnlSnapshot (stock_name_proc, float (child.get ('price')), quantity) 3r31220. dict_deals['result'].append (-1 * brok_comm) 3r31220. else:
pnl.update_by_tradefeed (float (child.get ('price')), quantity)
3r31220. # Selling papers, fixing the result of
if quantity < 0:
if pnl.m_realized_pnl> 0 and exchange_name! = 'SPB':
pnl_sum = pnl.m_realized_pnl * ??? - brok_comm
else:
pnl_sum = pnl.m_realized_pnl - brok_comm
3r31220. dict_deals['result'].append (float (pnl_sum))
else:
pnl.update_by_marketdata (current_cost)
dict_deals['result'].append (-1 * brok_comm) 3r31220. 3r31220. put_avr_price_in_df (account_name, stock_name_proc,
pnl.m_net_position, pnl.m_avg_open_price)
3r31220. current_cost = get_current_cost (stock_name_proc)
pnl.update_by_marketdata (current_cost)
if len (dict_deals['result'])> 0:
if exchange_name! = 'SPB':
dict_deals['result'] [-1]= pnl.m_unrealized_pnl * ??? -dict_deals['brokerage'] [-2]3r31220. else:
dict_deals['result'] [-1]= pnl.m_unrealized_pnl - dict_deals['brokerage'] [-2]3r31220. 3r31220. except Exception as e: 3r31220. print ('get_deals -> Oops! It seems we have a BUG!', e) 3r31028. 3r31216. 3r31216.
3r31220. In addition to processing an array with information about the parameters of the transaction, it also calculates the average price of an open position and the PNL realized by the FIFO method. The PnlSnapshot class is responsible for this calculation, for the creation of which, with minor modifications, the code presented here is taken as the basis: 3r38888. P & L calculation
3r31220.
3r31220. And finally, the most difficult to implement - the function of obtaining information about non-trading operations - 3-3-3711. get_nontrade_operation [/b] . Its complexity lies in the fact that in the report block used for non-trading operations, there is no clear information about the type of operation and the security to which this operation is linked.
3r31220.
3r31220. 3r31163. 3r3r1164. An example of payment arrangements for non-trading operations 3-3-31165. 3r31166. Payment of dividends or accrued coupon income can be indicated as follows: 3r3-31206. 3r31220.
3r31220. 3r3758. 3r31220. 3r33967. The payment of income client <777777> 3r3634. dividends 3r3-33635. 3r3622. APPLE 3r33635. INC-ao> -> payment of dividends from the SPB report; 3r33975. 3r31220. 3r33967. The payment of income client <777777> 3r3634. dividends 3r3-33635. 3r3622. MICROSOFT
Com-ao>
3r31220. 3r33967. Payment of customer income 777777i (3r3634. NKD 3r3-33635. 2 3r363634. OFZ 24019 3r3-33635.) Withholding tax ??? rubles -> coupon payment from the MOEX report; 3r33975. 3r31220. 3r33967. The payment of income client 777777
dividends of FGC UES 3r3-33635. -ao tax withholding XX.XX rubles -> payment of dividends from the MOEX report. etc. 3r33975. 3r31220.
3r31216. 3r31216.
3r31220. Accordingly, it will be difficult to do without regular expressions, therefore we will use them to the full. The other side of the issue is that the name of the company does not always coincide with the name in the portfolio or in the transactions. Therefore, the received name of the issuer from the purpose of payment must be additionally correlated with the dictionary. We will use an array of transactions as a dictionary, since There is the most comprehensive list of companies.
3r31220.
3r31220. Function get_company_from_str extracts the name of the issuer from the comment: 3r3-31206. 3r31220.
3r31220. 3r31163. 3r3r1164. The function get_company_from_str [/b] 3r31166.
3r3828. def get_company_from_str (comment):
company_name = ''
3r31220. # Templates for dividends /coupon cases
flows_pattern =[
'^.+дивидендыs<(w+)?.+-ао>$',
'^.+дивидендыs(.+)-а.+$',
'^.+(НКДsd?s(.+)).+$',
'^.+дивидендамs(.+)-.+$'
]3r31220. 3r31220. for pattern in flows_pattern:
match = re.search (pattern, comment)
if match: 3r31220. return match.group (1) .upper ()
3r31220. Return company_name 3r31028. 3r31216. 3r31216.
3r31220. Function get_company_from_briefcase brings the name of the company to the dictionary, if it finds a match among the companies that participated in the transactions: 3r3-31206. 3r31220.
3r31220. 3r31163. 3r3r1164. Function get_company_from_briefcase [/b] 3r31166.
3r3828. def get_company_from_briefcase (company_name):
company_name_full = None
3r31220. value_from_dic = df_deals[df_deals['stock_name'].str.contains (company_name)]
company_arr = value_from_dic['stock_name'].unique () 3r31220. 3r31220. if len (company_arr) == 1:
company_name_full = company_arr[0]3r31220. 3r31220. Return company_name_full 3r31028.
3r31220. 3r31216. 3r31216.
3r31220. And finally, the final data collection function for non-trading operations is 3r-33711. get_nontrade_operation [/b] :
3r31220.
3r31220.
3r3r1164. Function get_nontrade_operation [/b] 3r31166.
3r3828. def get_nontrade_operation (XMLdata):
nontrade_money_operation = XMLdata.find (flows)
3r31220. if not nontrade_money_operation:
return
3r31220. try: 3r31220. for child in nontrade_money_operation:
3r31220. comment = child.get ('comment')
type_oper_match = re.search ('dividends | NKD | ^. + tax. + dividends. + $', comment) 3r31220. 3r31220. if type_oper_match:
3r31220. company_name = get_company_from_str (comment)
type_oper = get_type_oper (comment)
3r31220. dict_flows['stock_name'].append (company_name)
dict_flows['account'].append (account_name)
dict_flows['date_oper'].append (to_dt (child.get (operationdate)). strftime ('% Y-% m-% d')) 3r31220. dict_flows['type_oper'].append (type_oper)
dict_flows['result'].append (float (child.get ('amount'))) 3r31220. dict_flows['currency'].append (child.get (currency_flows))
3r31220. except Exception as e: 3r31220. print ('get_nontrade_operation -> Oops! It seems we have a BUG!', e) 3r31028. 3r31216. 3r31216.
3r31220. The result of the data collection from the reports will be three DataFrame, which are approximately the following: 3r3-31206. 3r31220.
3r31220. 3r3758. 3r31220. 3r33967. DataFrame with information on average open positions: 3r3-31206. 3r31220. 3r31181. 3r33737. 3r31216. 3r33975. 3r31220. 3r33967. DataFrame with information about transactions:
3r31220. 3r31181. 3r3772. 3r31216. 3r33975. 3r31220. 3r33967. DataFrame with information about non-trading operations: 3r3-31206. 3r31220. 3r31181. 3r33780. 3r31216.
3r31220. 3r33975. 3r31220.
3r31220. So, all we have to do is perform the external merging of the table of transactions with the table of information about the portfolio: 3r3-31206. 3r31220.
3r31220.
3r3828. df_result = pd.merge (df_deals, df_stocks_avg, how = 'outer', on =['stock_name', 'account', 'currency']) .fillna (0)
df_result.sample (10) 3r31028.
3r31220. 3r31181. 3r31220. And finally, the final part of the processing of the data set is the merging of the data set obtained at the previous step with the DataFrame for non-trade transactions.
3r31220. The result of the work done is one large flat table with all the necessary information for the analysis:
3r31220.
3r31220.
3r3828. df_result_full = df_result.append (df_flows, ignore_index = True) .fillna (0)
df_result_full.sample (10) .head () 3r31028.
3r31220. 3r31181. 3r33838. 3r31216.
3r31220. The resulting data set (Final Report) from the DataFrame is easily uploaded to CSV and can then be used for detailed analysis in any BI system.
3r31220.
3r31220.
3r3828. if not exists ('OUTPUT'): makedirs ('OUTPUT')
report_name = 'OUTPUTmy_trader_diary.csv'
3r31220. df_result_full.to_csv (report_name, index = False, encoding = 'utf-8-sig') 3r31028.
3r31220.
3r31220. 3r31188. Download and process data in AWS
3r31220. Progress does not stand still, and now cloud services and serverless computing models are gaining popularity in processing and storing data. This is largely due to the simplicity and low cost of this approach, when building an architecture for complex calculations or processing large data does not require buying expensive equipment, and you only rent power for the cloud you need and deploy the necessary resources quickly enough for a relatively small fee. .
3r31220.
3r31220. One of the largest and most well-known cloud technology vendors on the market is Amazon. Consider the example of the environment Amazon Web Services (AWS) building an analytical system for processing data on our investment portfolio.
3r31220.
3r31220. AWS has an extensive selection of tools, but we will use the following: 3r3-31206. 3r31220.
3r31220. 3r33937. 3r31220. 3r33967. 3r33857. Amazon S3 3r3-31209. - object storage, which allows you to store almost unlimited amounts of information; 3r33975. 3r31220. 3r33967. 3r33862. AWS Glue
- the most powerful cloud ETL-service, which can itself determine the structure and generate the ETL-code from the given source data; 3r33975. 3r31220. 3r33967. 3r33867. Amazon Athena
- serverless service of interactive SQL queries, allows you to quickly analyze data from S3 without much preparation. He also has access to the metadata that AWS Glue prepares, which makes it possible to access the data immediately after passing the ETL; 3r33975. 3r31220. 3r33967. Amazon QuickSight - serverless BI service, you can build any visualization, analytical reports "on the fly", etc.
3r31220.
3r31220. With Amazon's documentation, everything is fine, in particular, there is a good article Best Practices When Using Athena with AWS Glue where it is described how to create and use tables and data using AWS Glue. Let us and we will use the main ideas of this article and apply them to create our own architecture of the analytical reporting system.
3r31220.
3r31220. CSV files prepared by our report parser will be added to the S3 bucket. It is planned that the corresponding folder on S3 will be replenished every Saturday - at the end of the trading week, so we can not do without data partitioning by the date of the formation and processing of the report.
3r31220. In addition to optimizing the work of SQL queries to such data, this approach will allow us to conduct additional analysis, for example, to obtain the dynamics of changes in the financial result for each paper, etc.
3r31220.
3r31220. 3r31163. 3r3r1164. Work with Amazon S3 [/b] 3r31166. 3r33937. 3r31220. 3r33967. Let's create a bake on S? let's call it “report-parser”; 3r33975. 3r31220. 3r33967. In this “report-parser” batch we will create a folder called “my_trader_diary”; 3r33975. 3r31220. 3r33967. In the “my_trader_diary” directory, create a directory with the current report date, for example, “date_report = 2018-10-01” and place a CSV file in it; 3r33975. 3r31220. 3r33967. Just for the sake of experiment and a better understanding of partitioning, we will create two more directories: "date_report = 2018-09-27" and "date_report = 2018-10-08". In them we put the same CSV file; 3r33975. 3r31220. 3r33967. The final S3 bakery “report-parser” should look like the image below: 3r3-31206. 3r31220.
3r31220. 3r31181. 3r33915. 3r31216. 3r33975. 3r31220.
3r31216. 3r31216.
3r31220. 3r31163. 3r3r1164. Work with AWS Glue [/b] 3r31166. By and large, you can get by just Amazon Athena to create an external table from the data lying on S? but AWS Glue is a more flexible and convenient tool for this.
3r31220.
3r31220. 3r33937. 3r31220. 3r33967. We go to AWS Glue and create a new Crawler, which will be from a separate CSV files for different dates to collect one table:
3r31220. 3r33937. 3r31220. 3r33967. Set the name of the new Crawler; 3r33975. 3r31220. 3r33967. Specify the repository, where to get the data from (s3: //report-parser /my_trader_diary /)
3r31220. 3r33967. Choose or create a new IAM role that will have access to launch the Crawler and access to the specified resource on S3; 3r33975. 3r31220. 3r33967. Next, you need to set the start frequency. While we put on demand, but in the future, I think this will change and the launch will be weekly; 3r33975. 3r31220. 3r33967. Save and wait for the Crawler to be created. 3r33975. 3r31220.
3r33975. 3r31220. 3r33967. When the Crawler enters the Ready state, run it!
3r31220.
3r31220. 3r31181. 3r3393963. 3r31216. 3r33975. 3r31220. 3r33967. Once it’s done, a new table my_trader_diary:
will appear in the AWS Glue: Database -> Tables tab. 3r31220.
3r31220. 3r31181. 3r33973. 3r31216. 3r33975. 3r31220.
3r31216. 3r31216.
3r31220. Consider the generated table in more detail.
3r31220. If you click on the name of the created table, we will go to the page with the description of the metadata. At the bottom there is a table diagram and the most recent is a column that was not in the original CSV file - date_report. Glue creates this AWS column automatically based on the definition of the source data sections (in the S3 batch, we specifically named folders - date_report = YYYY-MM-DD, which made it possible to use them as partitions divided by date).
3r31220.
3r31220. 3r31163. 3r3r1164. Sectioning table [/b] 3r31166. 3r31181. 3r3993. 3r31216.
3r31220. On the same page in the upper right corner there is a button View partitions, by clicking on which we can see which sections our formed table consists of:
3r31220. 3r31181. 3r3-3000. 3r31216. 3r31216. 3r31216.
3r31220. 3r31188. Data analysis 3r31189.
3r31220. Having at our disposal the loaded processed data, we can easily proceed to their analysis. First, let's look at the features of Amazon Athena as the easiest and fastest way to perform analytical queries. To do this, go to the Amazon Athena service, select the database we need (financial) and write the following SQL code:
3r31220.
3r31220. select
d.date_report, d.account,
d.stock_name, d.currency,
sum (d.quantity) as quantity,
round (sum (d.result), 2) as result
from my_trader_diary d
group by
d.date_report, d.account,
d.stock_name, d.currency
order by 3r31220. d.account, d.stock_name,
d.date_report; 3r3-1027. 3r31028.
3r31220. This query will give us the net financial result for each paper for all reporting dates. Since we have downloaded the same report three times for different dates, and the result will not change, which, of course, in the conditions of the real market will be different:
3r31220.
3r31220. 3r31181. 3r31036. 3r31216.
3r31220. But what if we want to visualize the data in the form of flexible tables or charts? This is where Amazon QuickSight comes to the rescue, with which you can set up flexible analytics almost as quickly as writing a SQL query. Let's go to Amazon QuickSight (if you have not registered there yet, then registration is required).
3r31220.
3r31220. Click on the button New analyses -> New dataset and in the window that appears, select sources for dataset, click on Athena:
3r31220.
3r31220. 3r31181. 3r31220.
3r31220. We will invent a name for our data source, for example, “PNL_analysis” and click on the “Create data source” button.
3r31220.
3r31220. Next, the Choose your table window will open, where you need to select a database and a data source table. Choose a database - financial, and a table in it: my_traider_diary. By default, the entire table is used, but when “Use custom SQL” is selected, you can customize and fine-tune the data sample you need. For example, use the entire table and click on the Edit /Preview Data button.
3r31220.
3r31220. A new page will open, where you can make additional settings and process the available data.
3r31220.
3r31220. Now it is necessary to add additional calculated fields to our dataset: quarter and year of the operation. An attentive reader may notice that it was easier to perform such manipulations on the side of the parser before saving the Final Report to CSV. Undoubtedly, my goal now is to demonstrate the capabilities and flexibility of setting up a BI system on the fly. Continue the creation of calculated fields by clicking on the "New field" button.
3r31220.
3r31220. 3r31163. 3r3r1164. Creating a new field [/b] 3r31166. 3r31181. 3r31220. To distinguish the year of the operation and the quarter, simple formulas are used: 3r3-31206. 3r31220.
3r31220. 3r31181. 3r31220. 3r31163. 3r3r1164. Filling in the formulas for the new field [/b] 3r31166. 3r31181. 3r31096. 3r31216. 3r31216. 3r31216.
3r31220. When the calculated fields are successfully created and added to the sample, we give the name to our dataset, for example, “my_pnl_analyze” and click on the “Save and visualize. "
3r31220.
3r31220. After that, we are transferred to the main Amazon QuickSight board and the first thing we need to do is set up a filter for the report date (taking into account that the same data was collected from three sections). Select the key date 2018-10-01 and click on the Apply button and go to the Visualize tab.
3r31220.
3r31220. 3r31163. 3r3r1164. Installing the filter 3r31165. 3r31166. 3r31181. 3r31115. 3r31216. 3r31216. 3r31216.
3r31220. Now we can visualize the result for a portfolio in any plane, for example, for each security within a trading account, and divided in turn by currency (since the result is not comparable in different currencies) and types of operations. Let's start with the most powerful tool of any BI - pivot tables. To save space and display flexibility, I rendered currencies into a separate control (analogue of the slice in MS Excel) 3r3-31206. 3r31220.
3r31220. 3r31181. 3rr31126. 3r31216. 3r?13?128. The table above shows that if an investor decides to sell all shares of FGC UES now, he will thereby fix the loss, since dividends paid in the amount of ??? p. do not cover its costs (??? p. - negative exchange difference and 174 p. - NDFL for dividends). It makes sense to wait and wait for the best times on the Exchange.
3r31220. The following graph is a bar chart:
3r31220.
3r31220. 3r31181. 3r31137. 3r31216.
3r31220. And now we will create a table that will show us how much we have invested in each paper, how many days it is in our portfolio and what is the yield for the entire period of ownership. To do this, add two new calculated fields: sum_investment and count_days.
3r31220.
3r31220. 3r31163. 3r3r1164. The field sum_investment [/b] 3r31166. The calculated field sum_investment (amount of investment) will be defined as:
3r31220.
3r31220. ifelse ({stock_name} = 'OFZ 24019', {avg_open_price} * quantity * 1? {avg_open_price} * quantity)
3r31220.
3r31220. This approach to processing the calculation of the amount of investments on bonds is due to the fact that the price is always indicated on them - as a percentage of the nominal (the nominal value in this case is 1000r).
3r31220. 3r31216. 3r31216.
3r31220. 3r31163. 3r3r1164. Field count_days [/b] 3r31166. The calculated count_day field (the number of days of paper ownership) we define as the difference between the date of the transaction and the reporting date and in the summary table we take the maximum:
3r31220.
3r31220. dateDiff (parseDate ({date_oper}), parseDate ({date_report}))
3r31220. 3r31216. 3r31216.
3r31220. The final table is shown in the screenshot below: 3r3-31206. 3r31220.
3r31220. 3r31181. 3r31182. 3r31216.
3r31220.
3r31220. 3r31188. Conclusions and Outcomes
3r31220. We have reviewed with you the implementation of the report parser and how to analyze the data prepared by it “on the fly” using Amazon services. Also touched upon some business and fundamental aspects of the analysis of the investment portfolio, because this topic is almost immense and it is quite difficult to fit it in one article, I think it makes sense to put it in a separate publication or even a cycle of publications.
3r31220.
3r31220. As regards the use of the broker's report processing tool and the approaches and algorithms involved in it, they can be used (with a corresponding modification) for processing reports of other Brokers. In any case, if you are thinking of adapting the code to your needs, I am ready to give some advice, so do not hesitate to ask questions - I will definitely try to answer them.
3r31220.
3r31220. I am sure that this system will find its application and will have further development. For example, it is planned to add to the calculation of the full PNL for the portfolio, accounting for depository and other commissions (for example, withdrawal of funds), as well as redemption of bonds, etc. The calculated fields on the Quicksight side were used for the demonstration purpose all these additional columns will be transferred to Python and will be calculated on the side of the parser.
3r31220.
3r31220. As an architect and chief business customer of this solution, I see further upgrades as follows: well, I don’t want to manually manually request these XML reports every time! Of course, there is no other possibility so far, but the Broker's API with the transfer of a token and a sampling range would be ideal for getting weekly raw reports. The subsequent full automatic processing on the Amazon side: from triggering an ETL-job on AWS Glue to getting finished results in the form of graphs and tables in Amazon QuickSight will allow you to fully automate the process.
3r31220.
3r31220. The full source code can be found in my repository at GitHub 3r31216. 3r31220. 3r31220. 3r3r12313. ! function (e) {function t (t, n) {if (! (n in e)) {for (var r, a = e.document, i = a.scripts, o = i.length; o-- ;) if (-1! == i[o].src.indexOf (t)) {r = i[o]; break} if (! r) {r = a.createElement ("script"), r.type = "text /jаvascript", r.async =! ? r.defer =! ? r.src = t, r.charset = "UTF-8"; var d = function () {var e = a.getElementsByTagName ("script")[0]; e. ): d ()}}} t ("//mediator.mail.ru/script/2820404/"""_mediator") () (); 3r31214. 3r31220. 3r31216. 3r31220. 3r31220. 3r31220. 3r31220.
It may be interesting
weber
Author16-10-2018, 04:35
Publication DateFinance in IT / Programming / Analysis and design of systems
Category- Comments: 1
- Views: 309
Here we introduce our top coupons that will help you for online shopping at discountable prices.Revounts bring you the best deals that slash the bills.If you are intrested in online shopping and want to save your savings then visit our site for best experience.