Friday, May 16, 2025
News PouroverAI
Visit PourOver.AI
No Result
View All Result
  • Home
  • AI Tech
  • Business
  • Blockchain
  • Data Science & ML
  • Cloud & Programming
  • Automation
  • Front-Tech
  • Marketing
  • Home
  • AI Tech
  • Business
  • Blockchain
  • Data Science & ML
  • Cloud & Programming
  • Automation
  • Front-Tech
  • Marketing
News PouroverAI
No Result
View All Result

Your Next AI Startup Should Be Built on Temporal [Part 1: Document Processing]

March 28, 2024
in Front-Tech
Reading Time: 4 mins read
0 0
A A
0
Share on FacebookShare on Twitter


Taking advantage of the burgeoning AI trend, many of today’s applications are built around AI tools like ChatGPT and other Large Language Models (LLMs). AI-optimized applications often have complex software pipelines for collecting and processing data with the LLM. Temporal provides an abstraction that can significantly simplify data pipelines, making them more reliable and accessible to develop. In this post, you’ll discover why you should use Temporal to build applications around LLMs.

Document Processing Pipelines

Large Language Models excel at answering general-purpose questions using public information from the Internet. However, when serving as the basis of your product, LLMs must provide accurate and up-to-date information about your business. Techniques like Prompt Engineering, Embeddings, and Context Injection allow LLMs to provide the exact information you want them to reference and explain to your users. While context window size for most LLMs is trending upward, allowing more and more information to be provided as part of a prompt, LLM pricing often means that the larger your prompts, the more expensive each request becomes. Creating a “pipeline” to collect and prepare your data to provide only the information you want the LLM to reference will make your application much less expensive and more performant.

Document Processing Pipeline Example

The example below shows how to generate Embeddings for a set of documents. Embeddings are numerical representations of text that can be used to find which documents are most similar to a user’s query. This allows you to inject only the most relevant information into your prompts. This example uses Langchain and OpenAI’s text-embedding-ada-002 embeddings model, but many other options are available if this doesn’t suit your application’s needs. This is just one example of something a pipeline could be used for. The techniques in this article are general purpose and are useful for applications built around LLMs, Semantic Search, Retrieval-augmented Generation, and many other AI technologies. The code in this example will collect and process all of the markdown files from the docs/ folder of the GitHub repository for Hatchify, a Bitovi open-source project.

Starting a Temporal workflow is done through a Temporal client:

Note: This is written in TypeScript, but Temporal also supports Python, Go, Java, .NET, and PHP. All of these code samples have been simplified for readability. The full code can be found on GitHub.

const id = `index-workflow-${nanoid()}`.toLowerCase().replaceAll('_', '') const handle = await client.workflow.start(documentsProcessingWorkflow, { taskQueue: 'documents-processing-queue', args: [{ id, repository: { url: 'https://github.com/bitovi/hatchify.git', branch: 'main', path: 'docs', fileExtensions: ['md'] } }], workflowId: id });

The code above starts the Temporal Workflow for the pipeline, executing the code in the example below. The Workflow code creates an S3 bucket for temporary storage, collects all of the documents, processes the documents and stores the embeddings data as vectors in Postgres, and then deletes the temporary S3 bucket:

export async function documentsProcessingWorkflow(input: DocumentsProcessingWorkflowInput): Promise<DocumentsProcessingWorkflowOutput> { await createS3Bucket({ bucket: id }) const { zipFileName } = await collectDocuments({ ... }); const { collection } = await processDocuments({ ... }) await deleteS3Object({ bucket: id, key: zipFileName }) await deleteS3Bucket({ bucket: id }) return { collection } }

Each of these functions is a Temporal Activity, which means that Temporal adds some additional functionality, which we’ll explain in the next section, but they are written as completely normal TypeScript functions. Here is the collectDocuments Activity:

Note: This is an embedded Gist to ensure the code displays in its entirety.

Similarly, the processDocuments Activity is another normal TypeScript function:

export async function processDocuments(input: ProcessDocumentsInput): Promise<ProcessDocumentsOutput> { const response = await getS3Object({ bucket: s3Bucket, key: zipFileName }) fs.writeFileSync(zipFileName, await response.Body.transformToByteArray()) await extractZip(zipFileName, { dir: path.resolve(temporaryDirectory) }) const embeddingsModel = new OpenAIEmbeddings({ openAIApiKey: OPENAI_API_KEY, batchSize: 512, modelName: 'text-embedding-ada-002' }) const pgvectorStore = await PGVectorStore.initialize( embeddingsModel, config ) filteredFileList.forEach(async ({ fileName: string, fileContent: string }) => { await pgvectorStore.addDocuments([{ pageContent: fileContent, metadata: { fileName, workflowId } }]) }) pgvectorStore.end() return { tableName } }

And there you have it — all of the code required for this pipeline to collect and process the documents from this GitHub repository and store them as embeddings within a Postgres database.

Benefits of Using Temporal for Document Processing

Temporal is an abstraction that delivers Durable Execution, which means that Temporal Workflows can never unintentionally fail and can run for as long as needed, without needing to store the state of execution outside of the function.

  1. Your Pipeline Can’t Fail
  2. Your Pipeline Can Run Forever
  3. Your Pipeline is Scalable

With the surge in demand for AI applications, it’s crucial to streamline data pipelines, especially when using powerful tools like Large Language Models (LLMs). This is where Temporal steps in as a game-changer, simplifying these pipelines and making them more reliable and user-friendly. By embracing Temporal, you can effortlessly manage document processing workflows, ensuring accuracy, efficiency, and affordability in AI-powered apps. Temporal’s Durable Execution ensures that workflows persist even in the face of potential hiccups. Plus, its infinite runtime capability means you can tackle large datasets without sacrificing scalability. By using Temporal for your document processing pipelines, you can fine-tune performance and maintain affordability. So, why not unlock a world of seamless AI integration and innovation with Temporal? Need help executing your Temporal vision? We can help! Our friendly team of Temporal Consulting experts would be happy to walk you through any step of your orchestration. Schedule a free consultation to get started.



Source link

Tags: BuiltDocumentPartprocessingStartupTemporal
Previous Post

How three filmmakers created Sora’s latest stunning videos

Next Post

How integrating generative AI with SAS Customer Intelligence 360 helps modern digital marketers

Related Posts

The essential principles of a good homepage
Front-Tech

The essential principles of a good homepage

June 7, 2024
How to measure and improve user retention
Front-Tech

How to measure and improve user retention

June 6, 2024
Push Animation on Grid Items
Front-Tech

Push Animation on Grid Items

June 5, 2024
How to build a Rails API with rate limiting
Front-Tech

How to build a Rails API with rate limiting

June 4, 2024
Introduction to the B.I.A.S. framework
Front-Tech

Introduction to the B.I.A.S. framework

June 3, 2024
Blue Ridge Ruby is exactly what we need
Front-Tech

Blue Ridge Ruby is exactly what we need

June 3, 2024
Next Post
How integrating generative AI with SAS Customer Intelligence 360 helps modern digital marketers

How integrating generative AI with SAS Customer Intelligence 360 helps modern digital marketers

Product spotlight: CodeSignal Develop – CodeSignal

Product spotlight: CodeSignal Develop - CodeSignal

Revolutionary biomimetic olfactory chips to enable advanced gas sensing and odor detection

Revolutionary biomimetic olfactory chips to enable advanced gas sensing and odor detection

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

  • Trending
  • Comments
  • Latest
Is C.AI Down? Here Is What To Do Now

Is C.AI Down? Here Is What To Do Now

January 10, 2024
23 Plagiarism Facts and Statistics to Analyze Latest Trends

23 Plagiarism Facts and Statistics to Analyze Latest Trends

June 4, 2024
Porfo: Revolutionizing the Crypto Wallet Landscape

Porfo: Revolutionizing the Crypto Wallet Landscape

October 9, 2023
A Complete Guide to BERT with Code | by Bradney Smith | May, 2024

A Complete Guide to BERT with Code | by Bradney Smith | May, 2024

May 19, 2024
How To Build A Quiz App With JavaScript for Beginners

How To Build A Quiz App With JavaScript for Beginners

February 22, 2024
Saginaw HMI Enclosures and Suspension Arm Systems from AutomationDirect – Library.Automationdirect.com

Saginaw HMI Enclosures and Suspension Arm Systems from AutomationDirect – Library.Automationdirect.com

December 6, 2023
Can You Guess What Percentage Of Their Wealth The Rich Keep In Cash?

Can You Guess What Percentage Of Their Wealth The Rich Keep In Cash?

June 10, 2024
AI Compared: Which Assistant Is the Best?

AI Compared: Which Assistant Is the Best?

June 10, 2024
How insurance companies can use synthetic data to fight bias

How insurance companies can use synthetic data to fight bias

June 10, 2024
5 SLA metrics you should be monitoring

5 SLA metrics you should be monitoring

June 10, 2024
From Low-Level to High-Level Tasks: Scaling Fine-Tuning with the ANDROIDCONTROL Dataset

From Low-Level to High-Level Tasks: Scaling Fine-Tuning with the ANDROIDCONTROL Dataset

June 10, 2024
UGRO Capital: Targeting to hit milestone of Rs 20,000 cr loan book in 8-10 quarters: Shachindra Nath

UGRO Capital: Targeting to hit milestone of Rs 20,000 cr loan book in 8-10 quarters: Shachindra Nath

June 10, 2024
Facebook Twitter LinkedIn Pinterest RSS
News PouroverAI

The latest news and updates about the AI Technology and Latest Tech Updates around the world... PouroverAI keeps you in the loop.

CATEGORIES

  • AI Technology
  • Automation
  • Blockchain
  • Business
  • Cloud & Programming
  • Data Science & ML
  • Digital Marketing
  • Front-Tech
  • Uncategorized

SITEMAP

  • Disclaimer
  • Privacy Policy
  • DMCA
  • Cookie Privacy Policy
  • Terms and Conditions
  • Contact us

Copyright © 2023 PouroverAI News.
PouroverAI News

No Result
View All Result
  • Home
  • AI Tech
  • Business
  • Blockchain
  • Data Science & ML
  • Cloud & Programming
  • Automation
  • Front-Tech
  • Marketing

Copyright © 2023 PouroverAI News.
PouroverAI News

Welcome Back!

Login to your account below

Forgotten Password? Sign Up

Create New Account!

Fill the forms bellow to register

All fields are required. Log In

Retrieve your password

Please enter your username or email address to reset your password.

Log In