If you want to check out the Query API response example, here's a link: https://docs.runcaptain.com/api-reference/query/collection-v...
The most similar product I've seen is Vertex File Search. They're hosted inside of GCP which can fit nicely into existing cloud deployments. Captain indexes from more sources (like R2 for example) and anecdotally provides faster indexing.
How do you compare to kapa.ai ? I have tried them.. and on search quality, they are really impressive.
Onyx, Sana, and Glean are closer to application-layer enterprise AI products. Their internal knowledge assistants can search across SaaS tools but the interface is more graphical and seats are purchased as end-user software.
Captain sits in between because it's an API-first retrieval system to fully-manage file workloads. This adds search capabilities to existing AI agents but the agents are managed by the developers, outside of Captain.
Kore.ai however is more of an agent platform. Their focus is building and orchestrating agent workflows (which can include document retrieval, but that's not their main focus).
How do you handle more structured data like csv/xlsx/json? Would be cool if it were possible to auto-process links to markdown (e.g. youtube, podcast, arbitrary websites, etc) a la https://github.com/steipete/summarize (which can pull full text in addition to summarizing).
Love the auto-process markdown idea, we'll add it to our roadmap :D
The problem that Captain really addresses comes when production pipelines need to run continuously over large file corpora with fast, incremental indexing, and reliable latency. The maintenance required in these situations is often quite significant.
Captain focuses specifically on making sure the retrieval layer can operate smoothly so folks don't have to scale & maintain the infrastructure themselves.
No disputation of the core idea, I think you are on the right track, but the pitch isn't compelling. People looking for these kinds of AI solutions tend to favor simplicity and ~80% is fine, because the overall perceived productivity improvement is 5-10x, with such wide error bars that the approximate gain is just not worth maximizing for right now.
You might be a few months-years early, or target people who have maxxed out because they cannot retrieve from their second brain effectively. Most folks I've talked to are just trying to keep up, optimization/efficiency is not on their radar.
2. It seems like it tries to emit citations, but doesn't emit proper links and instead just wrote [filename].
> one of the most common pieces of advice Y Combinator gives to startups [153_do_things_that_dont_scale.pdf].
Yeah good catch on the demo. If this were a production deployment, the citations would be hyperlinked to object storage. Captain is just the index, so the real files would be wherever they were indexed from.
If you know what Captain is, this is not an issue. I closed the browser tab at first, thinking "what the hell is this, I don't give a damn about shipping forecasts"
can you expand on that?
:O
For larger enterprises that require governance and additional compliance, we've been relying on trusted partners to help establish a connection to Captain
Look at Tobi vibe-coding QMD, he's not a full-time engineer and vibed that up and now it's used as the defacto RAG engine for OpenClaw.
I spent the last two days building this exact thing for our internal use.
Managed to get a full RAG pipeline integrated and running with all of our company documents in less than two days work.
Chunking, embedding and querying, connected to S3 and Google Drive, and running on our own hardware (and scaling on AWS too if needed).
I like how clean and compressed the info is
I also appreciate transparent pricing but I am not 100% sure the sense of scale of costs. It could be helpful to give some ballparks on things for each of the plans. I'm not sure exactly what i could get out of a plan. My guess, trying hard to figure it out, was if i had about 1,000 pages of new/updated content per month, I would pay $295/month for unlimited queries on top of it. Is that roughly correct?
Advanced and Basic do play a difference though. Advanced is for complex graphics or charts in the documents submitted. Basic is sufficient for most document workloads.