Uncover hidden connections in unstructured financial data with Amazon Bedrock and Amazon Neptune

In asset management, portfolio managers need to closely monitor companies in their investment universe to identify risks and opportunities, and guide investment decisions. Tracking direct events like earnings reports or credit downgrades is straightforward—you can set up alerts to notify managers of news containing company names. However, detecting second and third-order impacts arising from events at suppliers, customers, partners, or other entities in a company’s ecosystem is challenging. For example, a supply chain disruption at a key vendor would likely negatively impact downstream manufacturers. Or the loss of a top customer for a major client poses a demand risk for the supplier. Very often, such events fail to make headlines featuring the impacted company directly, but are still important to pay attention to. In this post, we demonstrate an automated solution combining knowledge graphs and generative artificial intelligence (AI) to surface such risks by cross-referencing relationship maps with real-time news. Broadly, this entails two steps: First, building the intricate relationships between companies (customers, suppliers, directors) into a knowledge graph. Second, using this graph database along with generative AI to detect second and third-order impacts from news events. For instance, this solution can highlight that delays at a parts supplier may disrupt production for downstream auto manufacturers in a portfolio though none are directly referenced. With AWS, you can deploy this solution in a serverless, scalable, and fully event-driven architecture. This post demonstrates a proof of concept built on two key AWS services well suited for graph knowledge representation and natural language processing: Amazon Neptune and Amazon Bedrock. Neptune is a fast, reliable, fully managed graph database service that makes it straightforward to build and run applications that work with highly connected datasets. Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Stability AI, and Amazon through a single API, along with a broad set of capabilities to build generative AI applications with security, privacy, and responsible AI. Overall, this prototype demonstrates the art of possible with knowledge graphs and generative AI—deriving signals by connecting disparate dots. The takeaway for investment professionals is the ability to stay on top of developments closer to the signal while avoiding noise. Build the knowledge graph The first step in this solution is building a knowledge graph, and a valuable yet often overlooked data source for knowledge graphs is company annual reports. Because official corporate publications undergo scrutiny before release, the information they contain is likely to be accurate and reliable. However, annual reports are written in an unstructured format meant for human reading rather than machine consumption. To unlock their potential, you need a way to systematically extract and structure the wealth of facts and relationships they contain. With generative AI services like Amazon Bedrock, you now have the capability to automate this process. You can take an annual report and trigger a processing pipeline to ingest the report, break it down into smaller chunks, and apply natural language understanding to pull out salient entities and relationships. For example, a sentence stating that “ expanded its European electric delivery fleet with an order for 1,800 electric vans from ” would allow Amazon Bedrock to identify the following: as a customer as a supplier A supplier relationship between and Relationship details of “supplier of electric delivery vans” Extracting such structured data from unstructured documents requires providing carefully crafted prompts to large language models (LLMs) so they can analyze text to pull out entities like companies and people, as well as relationships such as customers, suppliers, and more. The prompts contain clear instructions on what to look out for and the structure to return the data in. By repeating this process across the entire annual report, you can extract the relevant entities and relationships to construct a rich knowledge graph. However, before committing the extracted information to the knowledge graph, you need to first disambiguate the entities. For instance, there may already be another ‘‘ entity in the knowledge graph, but it could represent a different organization with the same name. Amazon Bedrock can reason and compare the attributes such as business focus area, industry, and revenue-generating industries and relationships to other entities to determine if the two entities are actually distinct. This prevents inaccurately merging unrelated companies into a single entity. After disambiguation is complete, you can reliably add new entities and relationships into your Neptune knowledge graph, enriching it with the facts extracted from annual reports. Over time, the ingestion of reliable data and integration of more reliable data sources will help build a comprehensive knowledge graph that can support revealing insights through graph queries and analytics. This automation enabled by generative AI makes it feasible to process thousands of annual reports and unlocks an invaluable asset for knowledge graph curation that would otherwise go untapped due to the prohibitively high manual effort needed. The following screenshot shows an example of the visual exploration that’s possible in a Neptune graph database using the Graph Explorer tool. Process news articles The next step of the solution is automatically enriching portfolio managers’ news feeds and highlighting articles relevant to their interests and investments. For the news feed, portfolio managers can subscribe to any third-party news provider through AWS Data Exchange or another news API of their choice. When a news article enters the system, an ingestion pipeline is invoked to process the content. Using techniques similar to the processing of annual reports, Amazon Bedrock is used to extract entities, attributes, and relationships from the news article, which are then used to disambiguate against the knowledge graph to identify the corresponding entity in the knowledge graph. The knowledge graph contains connections between companies and people, and by linking article entities to existing nodes, you can identify if any subjects are within two hops of the companies that the portfolio manager has invested in or is interested in. Finding such a connection indicates the article may be relevant to the portfolio manager, and because the underlying data is represented in a knowledge graph, it can be visualized to help the portfolio manager understand why and how this context is relevant. In addition to identifying connections to the portfolio, you can also use Amazon Bedrock to perform sentiment analysis on the entities referenced. The final output is an enriched news feed surfacing articles likely to impact the portfolio manager’s areas of interest and investments. Solution overview The overall architecture of the solution looks like the following diagram. The workflow consists of the following steps: A user uploads official reports (in PDF format) to an Amazon Simple Storage Service (Amazon S3) bucket. The reports should be officially published reports to minimize the inclusion of inaccurate data into your knowledge graph (as opposed to news and tabloids). The S3 event notification invokes an AWS Lambda function, which sends the S3 bucket and file name to an Amazon Simple Queue Service (Amazon SQS) queue. The First-In-First-Out (FIFO) queue makes sure that the report ingestion process is performed sequentially to reduce the likelihood of introducing duplicate data into your knowledge graph. An Amazon EventBridge time-based event runs every minute to start the run of an AWS Step Functions state machine asynchronously. The Step Functions state machine runs through a series of tasks to process the uploaded document by extracting key information and inserting it into your knowledge graph: Receive the queue message from Amazon SQS. Download the PDF report file from Amazon S3, split it into multiple smaller text chunks (approximately 1,000 words) for processing, and store the text chunks in Amazon DynamoDB. Use Anthropic’s Claude v3 Sonnet on Amazon Bedrock to process the first few text chunks to determine the main entity that the report is referring to, together with relevant attributes (such as industry). Retrieve the text chunks from DynamoDB and for each text chunk, invoke a Lambda function to extract out entities (such as company or person), and its relationship (customer, supplier, partner, competitor, or director) to the main entity using Amazon Bedrock. Consolidate all extracted information. Filter out noise and irrelevant entities (for example, generic terms such as “consumers”) using Amazon Bedrock. Use Amazon Bedrock to perform disambiguation by reasoning using the extracted information against the list of similar entities from the knowledge graph. …

Source link

No Result