Paprika — Harris Su

About the Project

Paprika is an autonomous agent capable of operating a 3D hamburger kitchen entirely through vision-based planning. While most game characters follow rigid scripts, Paprika “sees” the kitchen, interprets orders, and navigates complex culinary workflows in real time.

Built With & Powered By

Paprika lives within the RestaurantGame3DUnity environment created by KaganAyten. We utilized this codebase for its robust 3D assets and core hamburger-making logic. He made the cooking environment for single human player, we turned it into an human-agent playground with adding more UI and scripts for llm function callings.

The World

github.com KaganAyten — RestaurantGame3DUnity Simple Hamburger making game like Overcooked in Unity3D

The Intelligence (My Contribution):

github.com vanillaSky00 — paprika Your Reliable Hamburger Workers

I transformed this manual “Overcooked-style” game into an AI research environment by implementing:

Vision-Based Planning: Replacing game-state data with visual input processing.
Autonomous Navigation: Enabling the agent to move between stations dynamically.
Reliable Decision Making: A custom logic layer that reads orders and prioritizes tasks based on the current kitchen state.

Motivation

youtube.com Alex Rampell — Software is Eating Labor a16z General Partner on how AI will reshape medicine, law, and accounting.

Back in 2025, a16z General Partner Alex Rampell talked about software eating labor markets — medicine, law, accounting. AI was going to change everything. But the more I thought about it, the more I felt like the conversation was missing something: reliability.

Not “it worked three times in a row” reliable. Actually reliable — the kind where you stop thinking about whether it’ll fail. So this project became a sandbox for exploring that question: how do you design an agent that humans can genuinely co-exist with? What does it even mean to build something trustworthy?

Beyond co-exist, my friend Liang-Chun Li and I would like to explored new HCI (Human Computer Interaction). What it actually takes to get an LLM agent to co-work with a human that mimic real word cheif, with human fiddling with the food and things in kitchen in read-time, and the agent can react accordingly. Even though we know LLMs are inherently slow, is there any applicable solution can be applied on agentic system that can create real-time illusion but also reliable: aligned with the chief/ rules in kitchen, and completing the work.

Architecture

voyager.minedojo.org Voyager: An Open-Ended Embodied Agent with Large Language Models The first LLM-powered embodied lifelong learning agent in Minecraft that continuously explores the world.

We built a small local agent system that receives data from a Unity game over WebSocket. The agent workflow is inspired by Voyager — four stages: curriculum, skill, act, and critic. It looks like a multi-agent setup on paper, but it’s really a single agent moving through different state nodes.

Problems We’re Working On

1. The agent ignores full tables

say-can.github.io Do As I Can, Not As I Say: Grounding Language in Robotic Affordances Robotics at Google

Even when our prompts explicitly say which tables are full, the agent still tries to place ingredients on them.

We’re working on this with context engineering. Reading the SayCan paper gave us the idea — they use RL to score tool selection, which made us realize we need to prune raw perception before it hits the model. So instead of dumping the full table state:

pt_1: full
pt_2: full
pt_3: empty
pt_4: empty

We send something like:

You can place on pt_3 or pt_4 since they are empty.

2. Ingredient imbalance

Even when we tell the agent to check ingredient counts, it still grabs the wrong things. Given something like:

bread: 1
lettuce: 0
cheese: 1
meat: 3
tomatoes: 0
onion: 0

It should go for lettuce. It doesn’t. So we’re thinking about reframing — instead of raw counts, tell it directly:

lettuce is the ingredient you've processed least

Or go further and give it an explicit priority list, so it can plan ahead:

priority list for now:
lettuce
tomato
onion
bread
cheese
meat

The idea is to let it generate a longer sequence of planned actions rather than making greedy one-step decisions.

3. Moving rule-based logic out of prompts

Some things don’t belong in a prompt. For example: if all ingredients — lettuce, tomato, onion, bread, cheese, meat — have at least one, make a hamburger. That’s just a function:

if all(count >= 1 for count in ingredients.values()):
    CAN_MAKE_HAMBURGER = True

if not IS_OTHER_EMERGENCIES and CAN_MAKE_HAMBURGER:
    # TODO: design this so emergencies can interrupt and resume
    make_hamburger()

The harder design question is how to handle interrupts — if an emergency happens mid-assembly, can it pause, deal with it, and come back? That’s what we’re still figuring out.

4. Function calling schema drift between Unity and Python

Right now both sides have to stay in sync on the schema. It’s annoying to maintain. We’re thinking about borrowing the “discover API” pattern — one side owns the schema, the other discovers it at runtime. Build the protocol first, let the schema live in one place.

What’s Next

Let customers talk to the agent directly
Have the agent interact with customers (not just the kitchen)
Support different food orders, not just burgers
Add a cashier agent