buffered output #478

michbsd · 2025-03-28T11:50:45Z

Hi,

No sure if this is a bug, or just a lack of me understanding the configuration options (so bear with me)

ugrep 7.3.0 amd64-portbld-freebsd14.1 +avx512; -P:pcre2jit; -z:zlib,bzip2,lzma,lz4,zstd,brotli,bzip3,7z,tar/pax/cpio/zip

I often need to grep a stream to stdin, e.g. varnishlog or just a simple tail - and it seems I get my grep but not remainder of the line, e.g.

 varnishlog -q 'ReqURL ~ crap'  | grep --line-buffered Age
    77: -   RespHeader     Age: 0
   168: -   RespHeader     Age: 0
   259: -   RespHeader     Age
^C

as you can see the last line does not print fully, until a new line is matched..

I've tried with and without --line-buffered - no change.

The text was updated successfully, but these errors were encountered:

genivia-inc · 2025-03-28T19:32:48Z

I had been thinking about improving this to get rid of the artifact, but have not yet found time to work on it. A good thing now is that ugrep "understands" the difference between a line-based regex and a multi-line regex, which is internally used for certain speed optimizations.

The line will be completed, but it takes until the next match to do so or until EOF, whichever comes first. This is an artifact of the built-in multiline matching capability of ugrep. Multiline matching can only work properly if the engine can safely flush the current line and move the next without skipping a potential match that spans the current line matched to the next line and beyond (which may not have been read yet). For example, the pattern birth[ \t\n]+date that allows spaces, tabs and newlines between two words. So two lines in the input my birth date is ... his birth birth\ndate is ... cannot be fully displayed until the next line with date arrives in the buffer. The funny thing is that the regex engine doesn't "know" anything about the output, so it goes off to find the next match when there is no date in the next line, like when the input ismy birth date is ... his birth birth\nmonth is ... Therefore, the first line is still not completed after my birth data is ... until a next match is found later.

Note that by contrast, GNU grep is a line-oriented matcher, so it consumes line-by-line to find matching lines (in essence).

I don't want to restrict ugrep's --line-buffered to match online lines and not support multi-line matching. That would be confusing to users as to why multi-line matching suddenly doesn't work anymore when --line-buffered` is specified.

genivia-inc · 2025-03-28T19:47:14Z

I should add that option -u (--ungroup) does flush lines immediately with --line-buffered. But lines with multiple matches on it will be displayed with each match separately (which is the purpose of -u).

genivia-inc · 2025-04-01T15:41:31Z

I'm not satisfied either. This should and can be improved.

At least when a non-matching line arrives in the buffer right after a line with a match that is displayed, then the complete matching line should be displayed, i.e. not dangle unfinished until another matching line arrives (much) later (as is the case right now.)

This is what I have in mind, hope it is acceptable:

without colors enabled, the matching line will immediately and completely displayed (no delay)
with colors enabled, the matching line with a color-highlighted match will immediately displayed, but is completed when the next line arrives in the buffer (one line delay)
with option -u, matching lines are immediately displayed like before with separate lines for each match (no delay)
multi-line pattern matching with patterns containing newlines affects the first point above, in that the matching line can't be completely displayed and will behave as the second point (one line delay)

Note the first point: this does not delay piping through ugrep such as tail -f file.log | ug pattern | more because ug output to a pipe is not color-highlighted.

Option --line-buffered has no effect when reading standard input from character devices and pipes. Ugrep detects standard input from character devices and pipes and will use this strategy, which it currently implements, but without the proposed improvements above.

genivia-inc · 2025-04-03T13:39:56Z

Quick update.

The dev implementation is done, except for refactoring the code, perform additional verification and testing, and by addressing one case I am not happy with yet (it is not perfectly following the proposed points).

Again, we don't need --line-buffered (or improve it) for this use scenario. It is only needed and implicitly enabled by the TUI -Q and --pager to avoid output delays in the TUI and pager.

Ugrep is also already made smart enough to immediately show matches and flush output when matches are made on standard input from a character device or a pipe, e.g. to follow input with tail -f file.log | ug pattern for example. This is done with non-blocking IO and handlers. This is a lot faster than reading input line-by-line, or worse, reading input byte-by-byte that would be extremely slow. I'm improving this part of the code to make sure matching lines are shown without unnecessary delay.

genivia-inc · 2025-04-08T20:49:03Z

Still not 100% happy with the ugrep update I'm working on to release. The stdin pipe to ugrep behavior should be the same as GNU grep. That's not yet the case, because ugrep may quit under certain circumstances (e.g. max matches were found), whereas GNU grep keeps draining stdin until EOF (if that ever comes). So let's do the same.

genivia-inc · 2025-04-12T19:19:23Z

Implementation and testing are complete. Results and performance look good.

The speed of input-following, with tail -f file | ugrep pattern for example, is efficient using non-blocking reads. So we don't read byte-by-byte or line-by-line, but rather use a non-blocking read to read as much as possible until the sender waits. When the sender waits, ugrep displays the results by checking EAGAIN. This gives the illusion that ugrep follows the input byte-by-byte, but it is much faster using non-blocking reads. Because I'm using non-blocking reads and we now want to display results as much as possible, I had to refactor the SIMD acceleration implementation of the regex engine to make it a little less greedy for input.

All matching lines are displayed immediately when color is not used. With color, the matching line is completed when the next line arrives in the buffer, as I've described above. The reason is that the ugrep regex engine is not line based like GNU grpe, but is inherently multi-line matching. It would be possible to exactly replicate GNU grep, but this requires separate code just for following the input line-by-line. I think that is overkill, but if necessary it can be done.

Will release an update soon.

genivia-inc added question A question that has or needs further clarification enhancement New feature or request labels Mar 29, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

buffered output #478

buffered output #478

michbsd commented Mar 28, 2025

genivia-inc commented Mar 28, 2025

genivia-inc commented Mar 28, 2025 •

edited

Loading

genivia-inc commented Apr 1, 2025 •

edited

Loading

genivia-inc commented Apr 3, 2025

genivia-inc commented Apr 8, 2025

genivia-inc commented Apr 12, 2025

buffered output #478

buffered output #478

Comments

michbsd commented Mar 28, 2025

genivia-inc commented Mar 28, 2025

genivia-inc commented Mar 28, 2025 • edited Loading

genivia-inc commented Apr 1, 2025 • edited Loading

genivia-inc commented Apr 3, 2025

genivia-inc commented Apr 8, 2025

genivia-inc commented Apr 12, 2025

genivia-inc commented Mar 28, 2025 •

edited

Loading

genivia-inc commented Apr 1, 2025 •

edited

Loading