Live artifact / research publication

Understand decoding before you tune blind.

SampleLens helps you see what actually changes when you adjust temperature, top_p, min_p, DRY, prompt structure, or runtime defaults. It combines an interactive decoding explorer with controlled prompt sweeps so you can build better intuition, compare outcomes, and choose more reliable presets for real AI workflows.

Learn the knobs Explore temperature, top_p, min_p, and DRY in one live artifact.
See real studies Prompt-sweep protocols are live, with results pages coming next.
Use it for Ollama, llama.cpp, local-model stacks, and any workflow that needs better defaults.

Why SampleLens

What readers actually come here for.

People rarely want another abstract explainer about sampling. They want to know which settings to try, what tradeoffs they create, and how those tradeoffs show up in real outputs. SampleLens is built to answer exactly that.

Pick better presets

The goal is to help you choose settings with more confidence, whether you are tuning for ideation, extraction, critique, ranking, or structured outputs.

See the settings in context

The useful question is not just what a knob does in theory. It is how that knob behaves on a real task with a real prompt, model, runtime, and rubric.

Trust the evidence

The article makes the point. The artifact or sweep page shows the outputs. If you cannot inspect the evidence, the research is not finished.

Artifact 001

Temperature, Top-P, Min-P, and DRY side by side.

Artifact 001 is a browser-side explorer for the decoding pipeline used by runtimes such as Ollama, llama.cpp, and LM Studio. It is the fastest way to see how each filter changes the candidate set before you commit to a preset or change a production workflow.

Move the controls to see how Temperature, Top-P, Min-P, and DRY reshape the token distribution. Use the raw, after-filter, and diff views to understand which tokens survive, which ones disappear, and why that changes the final output.

Browser-side only

Parameters

Temperature 1.00
Scales logit sharpness. Lower values make the distribution peak harder. Higher values make weaker tokens competitive.
Top-P 1.00
Keeps the smallest token set whose cumulative probability reaches the threshold. This trims the long tail.
Min-P 0.00
Removes tokens below min_p x p(best token). Unlike Top-P, the floor scales with model confidence.
DRY Multiplier 0.00
Penalizes tokens that would continue repeated sequences, reducing loops without punishing every repeated word.

Token distribution

15 tokens survive the current filter stack

Filter pipeline

Artifact 001 uses a simulated token distribution so the mechanics stay visible and instant. The next layer is real prompt sweeps with outputs, rubrics, and practical takeaways you can carry back into a live stack.

See benchmark families

Temperature

Applied before filtering. It changes the shape of the entire distribution. Lower values sharpen. Higher values flatten.

Top-P

Mass-based tail trimming. It keeps enough tokens to cover the requested cumulative probability and drops the rest.

Min-P

A confidence-relative floor. The threshold rises when the model is confident and relaxes when the distribution is uncertain.

DRY

Targeted repetition control. It penalizes repeated continuations instead of globally muting every token seen in context.

Benchmarks

Research that turns settings into decisions.

The site starts with decoding mechanics, then moves into studies where settings have obvious downstream consequences. The first tracks focus on recurring tasks where better defaults save time and reduce drift.

Protocol 001

Published now

Business ideas under invariants

The first protocol is live. It defines a constrained ideation task, the invariants, the scoring rubric, and the sweep plan before any results are published.

Archive

Live now

Research archive

The archive collects live artifacts, published protocols, upcoming studies, and the practical rules that keep the research honest and useful.

Track 03

Structured extraction

Compare settings for schema fidelity, verbosity control, and refusal behavior when extracting data from messy source text into a fixed shape.

Track 04

Critique and ranking

Study when lower-variance settings help models disqualify weak options cleanly and when some controlled stochasticity improves coverage of hidden weaknesses.

Method

Every study should answer the same practical questions.

Good research should leave you with better defaults, not just a nicer theory. Every SampleLens study is meant to be narrow enough to trust, concrete enough to repeat, and useful enough to change how you set up the next run.

01

Question

One concrete question, not a broad essay. Example: which decoding range preserves constraint adherence without collapsing idea quality?

02

Setup

Prompt family, invariants, model, runtime, seed policy, and date. If the setup drifts, the claim drifts.

03

Sweep

A clear parameter grid with enough variation to show behavior change instead of cherry-picking one "good" answer.

04

Artifact

Outputs, controls, and lightweight scoring in a format the reader can inspect, compare, and cite.

05

Takeaway

A plain conclusion, an explicit default, and a note about what not to overclaim from the evidence.

FAQ

Questions a new reader should get answered quickly.

The point of the landing page is simple: explain who this is for, what the artifact does, what kind of research is coming, and why the work should be trusted.

Who is SampleLens for?

SampleLens is for people who tune language models in practice: AI builders, local-model users, and curious researchers who want clearer intuition about decoding settings and better defaults.

Does Artifact 001 use a live language model?

No. It uses a simulated token distribution so the filter math stays visible and instant in the browser.

The mechanics match real inference pipelines closely enough to explain how temperature, top_p, min_p, and dry_multiplier interact in runtimes such as llama.cpp and Ollama.

What kind of research will SampleLens publish?

The focus is on recurring AI tasks where settings materially change the result: constrained ideation, structured extraction, critique, ranking, and copy generation.

Each study will publish a fixed setup, a small parameter sweep, the outputs, lightweight scoring, and a plain takeaway about what worked and what drifted.

What comes after this first artifact?

The next step is controlled prompt-sweep studies with fixed tasks, explicit constraints, parameter grids, outputs, and practical scoring. Over time that should grow into a usable archive of model behavior notes rather than a pile of generic AI posts.

When should I use Min-P instead of Top-P?

You can use both, but they solve adjacent versions of the same problem. top_p trims the low-probability tail by cumulative mass. min_p drops tokens beneath a threshold relative to the best token, which makes it react better to changes in model confidence.

A common modern setup is to leave top_p=1.0 and use a mild min_p floor instead. The artifact above makes the tradeoff visible before you hard-code a preset.