Example¶
In this section, a real-life use case is explained to demonstrate how Data Product Hub can be utilized.
Problem statement¶
John, a senior investment analyst at a finance company, aims to assess investor risk, evaluate company valuations, and identify high-potential companies for hedge fund portfolios. To achieve this, collaboration will take place with Max, a data analyst, to develop a comprehensive dashboard for corporate performance and hedge fund metrics, designed to optimize investment strategies and manage risk effectively. The key areas to be highlighted are as follows:
- Performance by Industry and Sector: The performance of different industries and sectors will be assessed to identify trends and areas for investment opportunities.
- Top-Performing Companies: Companies will be identified based on their performance using key financial indicators.
- Revenue and Financial Impact Analysis: Revenue and financial metrics will be analyzed to determine their impact on overall portfolio performance.
- Operational Efficiency Metrics: Operational efficiency will be measured to assess financial health and identify areas for operational improvements.
- Debt and Risk Analysis: Companies will be compared based on their debt levels to manage investment risk and optimize portfolio allocation.
- Financial Growth Patterns: Financial growth patterns will be identified to uncover potential investment opportunities and refine strategies.
- Hedge Fund Metrics Overview: Hedge fund metrics will be compared to optimize investment decisions and manage fund performance.
- Regional Performance Comparison: The performance of companies across different countries will be compared to identify regional trends and opportunities.
- Investor Risk Assessment: Financial metrics will be used to assess and manage investor risk within the portfolio.
The goal is to enhance the ability to assess corporate performance, manage hedge fund metrics, and optimize investment strategies, leading to improved decision-making and portfolio performance.
Discovering a Data Product¶
To address the problem, the Data Product Hub, a graphical user interface within DataOS, is utilized by Max, a data analyst. This platform allows data analysts to discover actionable Data Products. The steps below are followed to identify relevant Data Products for solving the use case.
-
The Data Product Hub 2.0 is accessed within the DataOS user interface to begin exploring the Data Product Hub.
DataOS User Interface -
After login, it redirected to the Data Product Hub home page, as displayed below.
Data Product Hub Home Page -
By default, Data Product recommendations are displayed based on the use case. However, to find Data Products in the 'Corporate Finance' domain, the Filters drop-down menu is accessed. The Domain option is selected, and 'Corporate Finance' is chosen, as shown below.
-
Recommendations for Data Products within the Corporate Finance domain are displayed. The 'Corp Market Performance' Data Product is identified as relevant for evaluating stock market risks.
-
The 'Corp Performance' Data Product is also found in the recommendations. This Data Product assists in identifying high-potential companies for hedge fund portfolios, tracking key financial indicators, and monitoring operational efficiency.
-
The 'Corp Market Performance' Data Product is selected for further exploration. This opens an interface where all details of the specific Data Product are displayed, as shown below.
-
Each tab of the Data Product is examined. In the Overview tab, the lineage of the Data Product is reviewed, including Inputs (input datasets), Outputs (datasets generated from the Data Product), Access options (ways the Data Product can be consumed), and Models (lens models consuming the Data Product).
-
The Input tab is used to explore input datasets in detail, providing a description of the dataset, its tier, domain, owner, and access restrictions, along with the DataOS address and other related Data Products.
Dataset columns can be searched by name using the search bar, as shown below.
-
Information about each column is provided in a tabular format, including data types of each column, as displayed below.
-
The Output tab is explored to examine the output dataset named
market_data
, ensuring all necessary dimensions and measures are available for the use case. -
Two dimensions,
marketid
andcompanyid
, are identified along with measures such ascapitalexpenditures
,shareholdersequity
,marketpershare
,equitypershare
,dividendpershare
,netincome
, andnet_profit_after_tax
, which will assist with the use case. -
In the Quality tab, Service Level Objectives (SLOs) are reviewed, including adherence levels for freshness, schema, validity, completeness, uniqueness, and accuracy. Despite some SLOs having 0% adherence, the output data is determined to be complete with the correct schema.
-
In the Access Options tab, various options for consuming the Data Product are reviewed. The Tableau Cloud option is identified as useful for dashboard creation.
-
Before consuming the Data Product, Max clicks on the Explore button in the top-right corner to further examine it.
-
Clicking the Explore button opens a studio interface. Within this interface, Max explores the iris board. For this use case, the
price_to_earnings_ratio
measure andcompany_id
dimension are selected. The Run Query button is clicked to examine the table, followed by clicking the Chart tab to visualize the data points. -
Additional exploration includes:
-
Analyzing
dividend_per_share
,total_dividend_per_share
, andearnings_per_share
to assess profitability and dividend distribution policies. -
Examining
total_shareholders_equity
to evaluate the financial health and stability of the company. -
Exploring
total_market_per_share
,marketpershare
, andtotal_dividend_per_share
to gauge stock performance. -
Investigating
total_net_income
andtotal_net_profit_after_tax
to measure profitability and efficiency. -
Analyzing
shareholdersequity
,capitalexpenditures
, andtotal_debt_hid
to assess capital management. -
Examining the
debt_equity_ratio
andequity_multiplier
to evaluate financial risk and stability.
-
-
The Data Product is bookmarked for daily updates. Bookmarked Data Products can be accessed later from the Favorites tab in the Data Products section.
-
Similarly, the ‘Corp Performance’ Data Product is explored. All necessary measures and dimensions required for the use case are identified.
Activating the Data Product via BI Sync¶
The ‘Corp Market Performance’ Data Product is returned to in order to share it with Tableau.
The steps below are followed to share the Data Product with Tableau:
-
The Access Options tab is navigated to, where the Tableau Cloud option is found under the BI Sync section, as shown below.
-
The Add connection button is selected, where Tableau Cloud credentials such as Project Name, Server Name, Site Id, Username, and Password are required.
-
After the credentials are provided, the Activate button is selected. This activates the Data Product, allowing it to be consumed on Tableau Cloud for dashboard creation. A new project in Tableau Cloud is created, named ‘Corporate finance’.
Consuming the Data Product on Tableau Cloud¶
Once the required Data Products are activated, the dashboard on Tableau Cloud is created by following these steps:
-
Tableau Cloud is logged into using the previously provided credentials, and the user is redirected to the Tableau Cloud home page, as shown below.
-
The Manage Projects option on the home page is selected, as shown below.
-
The Manage Projects option opens an interface where all projects, including the newly created ‘Corporate finance’ project, are listed, as shown below.
-
The ‘Corporate finance’ project is selected, displaying the data sources available for dashboard creation.
-
The menu option in the top-right corner of the data source is selected, followed by the New Workbook option, as shown below.
-
To create a new workbook, the DataOS username and API key are provided as the password to sign in to the data source.
-
After signing in, redirection to the workbook occurs, allowing the dashboard to be created.
Exploring the Data Product¶
With the dashboard setup completed, the next step involves adding forecasting capabilities to predict future trends for the hedge fund portfolio. This assists in making informed investment decisions and anticipating risks. The following steps are taken:
-
The Data Product Hub is revisited, and the Explore button is selected, opening an interface to perform cross Data Product analysis. This analysis helps determine if enough data is available to build a forecast model.
-
The Corp Market Performance and Corp Performance Data Products are compared to analyze company performance across different industries and sectors. Focus is placed on metrics like net income, revenue growth, and operational efficiency to assess which companies in various sectors deliver the strongest financial results.
-
Data is filtered to focus on sectors such as ‘Financial Services’, identifying companies with the highest earnings per share and net income, narrowing down high-performing companies for hedge fund allocation.
-
The exploration is saved as a perspective, ensuring that these insights can be revisited frequently. The Save Perspective option is selected, as shown below.
-
-
To save the perspective, a name and description of the exploration are provided for future reference.
-
The saved perspective is accessed by navigating to the Perspectives tab, as shown below.
-
The perspective is searched by name within the Perspectives tab, as shown below.
-
Selecting the perspective redirects to the Explore interface, where all perspectives created on the Data Product, including the saved one, can be accessed.
Activating the Data Product via Jupyter Notebook¶
Once the Data Product is explored, the next step is to build models. The Data Product is activated via Jupyter Notebook by following these steps:
-
The Corp Market Performance Data Product is navigated to, and the Access Options tab is accessed. In the AI and ML section, the Download button is selected, downloading a
.ipynb
file. -
The
.ipynb
file is opened using Visual Studio, as shown below. -
The REST API template is selected, which appears as follows:
-
REST API template:
# Import necessary libraries import requests import pandas as pd import json # API URL and API key api_url = "https://lucky-possum.dataos.app/lens2/api/public:company-intelligence/v2/load" apikey = 'api key here' # API payload, enter YOUR_QUERY here. payload = json.dumps({ "query": { YOUR_QUERY } }) # Headers headers = { 'Content-Type': 'application/json', 'apikey': apikey } # Fetch data from API def fetch_data_from_api(api_url, payload, headers=None): response is = requests.post(api_url, headers=headers, data=payload) if response.status_code == 200: data = response.json() df = pd.json_normalize(data['data']) # Create DataFrame return df else: print(f"Error: {response.status_code}") return None # Main execution if __name__ == "__main__": data = fetch_data_from_api(api_url, payload, headers=headers) if data is not None: print("Data Frame Created:") print(data.head()) # Show the first few rows of the DataFrame print("Ready for AI/ML model building.") else: print("Failed to fetch data.")
-
-
In the template, it is determined that the API URL must be provided as
api_url
and the DataOS API key asapikey
. To retrieve these, the Data APIs section in DPH is navigated, where the Postman collection and OpenAPI specification are downloaded to explore and test the API endpoint. -
The Postman application is opened, and the Postman collection is imported to test the API endpoint.
-
The base URL is copied, pasted in place of
{{baseUrl}}
, and the DataOS API key is provided as a bearer token. The Send button is selected, confirming that the API endpoint is functioning correctly. -
The API URL and API key are provided in the REST API template, and the code is executed. The system is now ready to build a forecasting model.
-
A request is made to Eric, the Data Product owner, to include columns related to forecasting in the 'Corp Performance' data product.
Through a thorough exploration and utilization of the Data Product Hub within DataOS, relevant Data Products such as Corp Market Performance and Corp Performance are successfully discovered and activated. By leveraging these Data Products, key insights into corporate performance, industry trends, and operational efficiency are provided to John, the senior investment analyst, enabling data-driven decision-making for hedge fund strategies.
The seamless integration of data with Tableau Cloud for dashboard creation and the use of Jupyter Notebook to build forecasting models demonstrate how effectively DataOS supports data analysts in managing complex financial use cases. With the collaboration from the Data Product owner, Eric, to include additional forecasting columns, the hedge fund strategies are optimized for future growth and risk mitigation.