We've just released the initial version of our open-source vector embedding pipeline. It's designed to embed large volumes of data. While embedding a few documents for Q&A is straightforward, consistently ingesting gigabytes of unstructured data is a whole different ballgame.
By using our API, you can embed raw data and store vectors in your vector database, sidestepping the complexities of cloud infrastructure.
Now, in true YC spirit, we're launching before everything is polished. Our Github repo (https://github.com/dgarnitz/vectorflow ) is a work in progress, and we're eager to get it in front of the community early on.
Check out our Github repo and let us know what you think. Your feedback, suggestions, and critiques will be highly appreciated as we continue to refine and develop.
VectorFlow (open-sourced on GH) is a new tool for ingesting data into Vector Databases such as Weaviate! There is quite an interesting End-to-End stack emerging at the ingestion layer, from:
- Retrieving data from misc. sources such as Slack, Salesforce, GitHub, Google Drive, Notion, ...
- To then Chunking the Text (maybe with the use of Visual Document Layout parsers like what Unstructured is imagining)
- Extracting Metadata potentially (say the "age" of an NBA player as in the Evaporate-Code+ research)
- Then sending this data off to embedding model inference and unpacking that can of worms from inference acceleration to load balancing
- Finally -- importing the vectors themselves to Weaviate!
I learned so much from this conversation, I really hope you enjoy listening and please check out VectorFlow!
I'm surely not the only one who has a whole collection of domain names. As much as I dislike squatters when registering them, I probably happen to be one sometimes.
Comment and list domains that you can rent out to others to validate their ideas with an option to buy it out completely. Or swap domain for others. (I'm tired of having the same portfolio).
Mine are:
hedgelet.com - (Finance / Hedging / Trading)
fed.pet - (Pet food / supplies / keep your pet happy)
propertytales.com - (Trustpilot / G2 for property movers / renters)
We've just released the initial version of our open-source vector embedding pipeline. It's designed to embed large volumes of data. While embedding a few documents for Q&A is straightforward, consistently ingesting gigabytes of unstructured data is a whole different ballgame.
By using our API, you can embed raw data and store vectors in your vector database, sidestepping the complexities of cloud infrastructure.
Now, in true YC spirit, we're launching before everything is polished. Our Github repo (https://github.com/dgarnitz/vectorflow
) is a work in progress, and we're eager to get it in front of the community early on.
Check out our Github repo and let us know what you think. Your feedback, suggestions, and critiques will be highly appreciated as we continue to refine and develop.