bash - Extract specific information from a log file using a script (Linux or Windows or Python) -
i interested in learning how extract information
- count occurrences of keywords,
- get timestamp specific occurrences of keywords (note timestamps same day; usually, within couple of hours of same day),
- get elapsed time of specific log entries
from text log file (log.txt
) via script (linux bash or windows batch or python). information should written in text file (results.txt
) or printed on terminal.
basically, other log entries (i.e. blah blah
ignored).
for example following text log file, each line starts timestamp followed empty space, dash line(-) , 1 or more empty space(s) followed keywords:
11:59:35.875 - action - write(34) start
11:59:35.875 - blah blah
11:59:35.875 - blah blah
11:59:35.877 - blah blah
11:59:35.897 - keyword_1
11:59:35.975 - action - write(34) end
11:59:36.992 - keyword_1
11:59:36.999 - keyword_1
11:59:37.535 - blah blah
11:59:37.545 - action_a - state: type 2
11:59:37.575 - blah blah
11:59:37.577 - blah blah
11:59:37.845 - keyword_2
11:59:37.945 - action_b result
11:59:37.950 - blah blah
11:59:38.075 - action - write(22) start
11:59:38.075 - blah blah
11:59:38.085 - blah blah
11:59:38.097 - keyword_2
11:59:39.975 - action - write(22) end
firstly, count occurrences of each of keyword_1
, keyword_2
(e.g. 2 , 2, respectively).
secondly, want able print timestamps of each keyword
occurrence, e.g. 11:59:35.897
first occurrence of keyword_1
.
finally, find elapsed time between 2 log entries:
- those start
- action - write(#) start
, end- action - write(#) end
#
integer number, e.g. 11:59:35.975 - 11:59:35.875 =1ms
firstwrite(34)
- those start
- action_a ...
, endaction_b ...
e.g. 11:59:37.545 - 11:59:37.945 =4ms
firstaction_a .. action_b
.
i have tried find /c "keyword_1" log.txt >results.txt
(windows batch) count occurrences cannot extract respective timestamp. other requirements have no idea how start have no experience such actions before. tried adapting answers question needs no success.
any code fragment example or link related resources appreciated.
@echo off setlocal set "sourcedir=u:\sourcedir" set "filename1=%sourcedir%\q40441783.txt" :: occurrence count keyword_1,keyword_2,keyword_3 %%k in (keyword_1 keyword_2 keyword_3) ( /f "delims=" %%c in ('type "%filename1%"^|find /c "%%k"') echo %%k : %%c times ) :: timestamp display keyword_1,keyword_2,keyword_3 %%k in (keyword_1 keyword_2 keyword_3) ( type "%filename1%"|find "%%k" ) :: remove variables starting $ or # %%b in ($ #) /f "delims==" %%a in ('set %%b 2^>nul') set "%%a=" :: action - write(#) /f "usebackqtokens=1*delims=- " %%a in ("%filename1%") ( rem interested in "pattern - write(#) start/end" /f "tokens=1-3*delims=-()" %%a in ("%%b") ( rem %%a action, %%b "write" %%c # %%d " start"/" end" if "%%b"==" write" if "%%d"==" start" set "$%%c %%a$=%%a" if "%%b"==" write" if "%%d"==" end" set "#%%c %%a#=%%a" call :formatch&call :report "write(%%c)" ) ) set $ 2>nul set # 2>nul :: remove variables starting $ or # %%b in ($ #) /f "delims==" %%a in ('set %%b 2^>nul') set "%%a=" /f "usebackqtokens=1*delims=- " %%a in ("%filename1%") ( rem interested in "action_a/action_b elapsed time" /f "tokens=1*delims=- " %%a in ("%%b") ( rem %%a action, %%b remainder of line if "%%a"=="action_a" set "$1$=%%a"&set "_actiona=%%b" if "%%a"=="action_b" set "#1#=%%a" call :formatch call :report "%%_actiona%% %%b" ) ) set $ 2>nul set # 2>nul goto :eof :: see whether $something$ , #something# exist , report if :formatch set "elapsed=" /f "tokens=1,2delims=$=" %%m in ('set $ 2^>nul') ( if defined #%%m# ( call :elapsed %%n %%#%%m#%% set "#%%m#=" set "$%%m$=" ) ) goto :eof :report if defined elapsed echo %elapsed% %~1 goto :eof :: %2 - %1 both in hh:mm.ss.ttt format :elapsed /f "tokens=1-4delims=:." %%w in ("%2") (set /a hh=2%%w&set /a mm=2%%x&set /a ss=2%%y&set /a ttt=2%%z) /f "tokens=1-4delims=:." %%w in ("%1") (set /a hh-=1%%w&set /a mm-=1%%x&set /a ss-=1%%y&set /a ttt-=1%%z) :: compensate "negatives" if %ttt% lss 1000 set/a ttt+=1000&set/a ss-=1 if %ss% lss 100 set/a ss+=60&set/a mm-=1 if %mm% lss 100 set/a mm+=60&set/a hh-=1 if %hh% lss 100 set/a hh+=24 set "elapsed=%hh:~-2%:%mm:~-2%:%ss:~-2%.%ttt:~-3%" goto :eof goto :eof
you need change setting of sourcedir
suit circumstances.
used file named q40441783.txt
containing data testing.
interesting exercise.
the first 2 steps obvious. included keyword_3
ensure correct report produced "not found". note state 2 occurrences keyword_1. actually, in posted data there 3.
the next step required explanation. first thing ensure there no variables starting #
or $
.
next- analyse each line, splitting first on first -
or space , processing part beyond first delimiter-sequence tokenising on -()
tokens described in rem
statement. set variable $...$
or #...#
time in %%a
. ...
here unique part of log entry - number , action. check whether there both $...$
, #...#
same ...
, if so, clear $#...$#
variables, calculate elapsed time, reconstruct line , report.
the elapsed-time calculation prepends 2 start of each of variables ensure don't start 0
, potentially treated octal. pull same trick subtracting start time, using prepended 1 produce result should 3 digits (4 ms). if fewer digits detected, need add appropriate number , deduct 1 next-higher time element.
the processing action_a/_b timing same, records start/end times in $!$/#1# since there no indication of nature of strings action_a
, action_b
, we're forced assume appropriate events don't overlap.
Comments
Post a Comment