Bio

My name is Joshua Levine. I'm a prompt engineer. AI systems are only as capable as the conversations they facilitate. The gap between what a system can do and what people actually use it for is an experience problem, not a capability problem. The second a user feels like the AI doesn't understand what they mean, not what they said but what they meant, the system stops being a tool and starts being an obstacle.

Before I was Head of Applied AI at Huzi, I built custom GPTs at the restaurant where I worked so wine servers could surface pairing recommendations mid-service. When I moved into real estate, I built research tools, objection-handling systems, and neighborhood analysis prompts. If the technology is there and nobody's using it well, I'll build the version that people actually want to interact with.

At Huzi, I reported directly to the CEO. I designed the prompt architecture for a configurable AI assistant platform: a modular system where platform behavior, assistant personality, coaching methodology, and task instructions each lived in their own layer with a single source of truth, so you could update one without breaking the others. I built and deployed AI coaching systems for enterprise clients in real estate, title, and financial services, turning their existing coaching methodologies into interactive conversational experiences.

Since Huzi, I've been shipping. I built a multi-agent publishing system that produces localized ebooks across ten languages. I designed and built Aloha, a personal AI assistant with persistent memory, a self-authoring identity architecture, and autonomous daily operations. The landscape in this field shifts constantly, and I care more about building a durable understanding of how these systems work than about nailing any particular setup. The only thing that ages well is the ability to evaluate what's actually happening and respond to it.

2024

Project 2

A short description of the second project goes here.

Multi-Agent Publishing System

The Question: Can a system of agents approximate real user research well enough to produce ten genuinely localized ebooks, not ten translations of one?

I published a 7-day digital detox ebook on Amazon KDP in ten languages. Not the same ebook translated ten times: ten different ebooks, each shaped by simulated user research conducted in the native language of that market.

The Approach: A publishing house built from first principles

I started by asking who would actually have eyes on a wellness publication at a real publishing house, then built each of those roles as an agent: a mental health professional, a compliance reviewer, an editor with final say, and a research lead who decides who to interview next based on what the feedback revealed. The system found its shape through the same questions a real publisher would ask, just answered by agents instead of employees.

The system runs in loops. Each loop starts with consumer personas being interviewed about the manuscript, then the manuscript goes through the review pipeline, then the research lead evaluates the editor's changes and selects new interview subjects for the next round. Multiple loops ran before publication across all ten languages.

The interviews run as two separate agents. One is the interviewer, one is the consumer, each taking turns through separate API calls. The consumer has no access to the interviewer's reasoning. I tested this against the single-agent alternative, where one model writes both sides, and the difference was meaningful: same headline conclusions, but the two-agent version surfaced things the scripted version could not, because there was genuine discovery happening instead of a monologue formatted as a conversation.

The Insight: Models preload opinions into personas, and you have to design against it

The persona design was where the most interesting problems lived. The default behavior when you ask a model to create a consumer persona is that it preloads opinions: "you are a German grandmother who is skeptical of self-help and prefers practical advice." This seeds the answers before the interview even runs. A persona told to be skeptical will perform skepticism regardless of whether the product is actually convincing. So the personas in this system contain only who the person is and what their life is like: a Norwegian grandmother, a Tokyo office worker with ADHD, a night-shift nurse. No opinions, no red flags, no instructions about how to react. Their responses to the book emerge from their situation, not from prompting.

One advantage this has over a traditional publishing house is that time isn't a constraint. A real consumer interview can only assess first impressions, not whether the program actually worked. In this system, the interview includes a break where two weeks pass, and the interviewer checks back in as if the consumer has lived through the program. The framing matters: "welcome back, did you actually buy it?" produces experience. "Imagine you completed the program" produces speculation.

The Result: Ten ebooks, one autonomous system

The system didn't produce ten translations. It produced ten different books. The Norwegian market deleted Day 3 entirely and moved its content to an appendix because no Norwegian consumer used it. The same product has a different number of days in different languages.

A Swedish farmer in Jämtland told the interviewer "det låter som något för stadsfolk" (this sounds like something for city people), which reframed an entire assumption baked into the English version: that your phone is a vice. In rural Sweden, your phone is a lifeline. That produced a whole new section.

The Japanese version added content for people in one-room Tokyo apartments, where "put your phone in another room" is physically impossible. The solution (putting the charger by the genkan, the entryway shoe area, so the phone is across the room without needing a separate bedroom) came directly from a consumer interview and has no equivalent in the English book.

The Italian persona writer rejected the editor's suggestion to interview a young woman with diagnosed anxiety disorder on publishing-ethics grounds, replaced her with a factory worker from Bari, and that replacement surfaced a Facebook-versus-Instagram demographic gap specific to the Italian market that the system hadn't seen. Two insights from one judgment call made by an agent, not a human.

I built this in January 2026, before agent orchestration frameworks existed. There was no swarm mode, no TeammateTool, no off-the-shelf multi-agent coordination. The whole system, roughly fifteen agents across ten languages, was hand-rolled through the Anthropic API. Once it was set up, the entire pipeline ran autonomously: you press go, and it does multiple passes of interviews, reviews, edits, and research-lead evaluation without intervention.

Bio