Pick better presets
The goal is to help you choose settings with more confidence, whether you are tuning for ideation, extraction, critique, ranking, or structured outputs.
Live artifact / research publication
SampleLens helps you see what actually changes when you adjust temperature, top_p, min_p, DRY, prompt structure, or runtime defaults. It combines an interactive decoding explorer with controlled prompt sweeps so you can build better intuition, compare outcomes, and choose more reliable presets for real AI workflows.
Why SampleLens
People rarely want another abstract explainer about sampling. They want to know which settings to try, what tradeoffs they create, and how those tradeoffs show up in real outputs. SampleLens is built to answer exactly that.
The goal is to help you choose settings with more confidence, whether you are tuning for ideation, extraction, critique, ranking, or structured outputs.
The useful question is not just what a knob does in theory. It is how that knob behaves on a real task with a real prompt, model, runtime, and rubric.
The article makes the point. The artifact or sweep page shows the outputs. If you cannot inspect the evidence, the research is not finished.
Artifact 001
Artifact 001 is a browser-side explorer for the decoding pipeline used by runtimes such as Ollama, llama.cpp, and LM Studio. It is the fastest way to see how each filter changes the candidate set before you commit to a preset or change a production workflow.
Move the controls to see how Temperature, Top-P, Min-P, and DRY reshape the token distribution. Use the raw, after-filter, and diff views to understand which tokens survive, which ones disappear, and why that changes the final output.
Browser-side onlyArtifact 001 uses a simulated token distribution so the mechanics stay visible and instant. The next layer is real prompt sweeps with outputs, rubrics, and practical takeaways you can carry back into a live stack.
See benchmark familiesApplied before filtering. It changes the shape of the entire distribution. Lower values sharpen. Higher values flatten.
Mass-based tail trimming. It keeps enough tokens to cover the requested cumulative probability and drops the rest.
A confidence-relative floor. The threshold rises when the model is confident and relaxes when the distribution is uncertain.
Targeted repetition control. It penalizes repeated continuations instead of globally muting every token seen in context.
Benchmarks
The site starts with decoding mechanics, then moves into studies where settings have obvious downstream consequences. The first tracks focus on recurring tasks where better defaults save time and reduce drift.
Protocol 001
Published now
The first protocol is live. It defines a constrained ideation task, the invariants, the scoring rubric, and the sweep plan before any results are published.
Archive
Live now
The archive collects live artifacts, published protocols, upcoming studies, and the practical rules that keep the research honest and useful.
Track 03
Compare settings for schema fidelity, verbosity control, and refusal behavior when extracting data from messy source text into a fixed shape.
Track 04
Study when lower-variance settings help models disqualify weak options cleanly and when some controlled stochasticity improves coverage of hidden weaknesses.
Method
Good research should leave you with better defaults, not just a nicer theory. Every SampleLens study is meant to be narrow enough to trust, concrete enough to repeat, and useful enough to change how you set up the next run.
One concrete question, not a broad essay. Example: which decoding range preserves constraint adherence without collapsing idea quality?
Prompt family, invariants, model, runtime, seed policy, and date. If the setup drifts, the claim drifts.
A clear parameter grid with enough variation to show behavior change instead of cherry-picking one "good" answer.
Outputs, controls, and lightweight scoring in a format the reader can inspect, compare, and cite.
A plain conclusion, an explicit default, and a note about what not to overclaim from the evidence.
FAQ
The point of the landing page is simple: explain who this is for, what the artifact does, what kind of research is coming, and why the work should be trusted.
SampleLens is for people who tune language models in practice: AI builders, local-model users, and curious researchers who want clearer intuition about decoding settings and better defaults.
No. It uses a simulated token distribution so the filter math stays visible and instant in the browser.
The mechanics match real inference pipelines closely enough to explain how temperature, top_p, min_p, and dry_multiplier interact in runtimes such as llama.cpp and Ollama.
The focus is on recurring AI tasks where settings materially change the result: constrained ideation, structured extraction, critique, ranking, and copy generation.
Each study will publish a fixed setup, a small parameter sweep, the outputs, lightweight scoring, and a plain takeaway about what worked and what drifted.
The next step is controlled prompt-sweep studies with fixed tasks, explicit constraints, parameter grids, outputs, and practical scoring. Over time that should grow into a usable archive of model behavior notes rather than a pile of generic AI posts.
You can use both, but they solve adjacent versions of the same problem. top_p trims the low-probability tail by cumulative mass. min_p drops tokens beneath a threshold relative to the best token, which makes it react better to changes in model confidence.
A common modern setup is to leave top_p=1.0 and use a mild min_p floor instead. The artifact above makes the tradeoff visible before you hard-code a preset.