bash - Extract specific information from a log file using a script (Linux or Windows or Python) -


i interested in learning how extract information

  • count occurrences of keywords,
  • get timestamp specific occurrences of keywords (note timestamps same day; usually, within couple of hours of same day),
  • get elapsed time of specific log entries

from text log file (log.txt) via script (linux bash or windows batch or python). information should written in text file (results.txt) or printed on terminal.

basically, other log entries (i.e. blah blah ignored).

for example following text log file, each line starts timestamp followed empty space, dash line(-) , 1 or more empty space(s) followed keywords:

11:59:35.875 - action - write(34) start

11:59:35.875 - blah blah

11:59:35.875 - blah blah

11:59:35.877 - blah blah

11:59:35.897 - keyword_1

11:59:35.975 - action - write(34) end

11:59:36.992 - keyword_1

11:59:36.999 - keyword_1

11:59:37.535 - blah blah

11:59:37.545 - action_a - state: type 2

11:59:37.575 - blah blah

11:59:37.577 - blah blah

11:59:37.845 - keyword_2

11:59:37.945 - action_b result

11:59:37.950 - blah blah

11:59:38.075 - action - write(22) start

11:59:38.075 - blah blah

11:59:38.085 - blah blah

11:59:38.097 - keyword_2

11:59:39.975 - action - write(22) end

firstly, count occurrences of each of keyword_1 , keyword_2 (e.g. 2 , 2, respectively).

secondly, want able print timestamps of each keyword occurrence, e.g. 11:59:35.897 first occurrence of keyword_1.

finally, find elapsed time between 2 log entries:

  1. those start - action - write(#) start , end - action - write(#) end # integer number, e.g. 11:59:35.975 - 11:59:35.875 = 1ms first write(34)
  2. those start - action_a ... , end action_b ... e.g. 11:59:37.545 - 11:59:37.945 = 4ms first action_a .. action_b.

i have tried find /c "keyword_1" log.txt >results.txt (windows batch) count occurrences cannot extract respective timestamp. other requirements have no idea how start have no experience such actions before. tried adapting answers question needs no success.

any code fragment example or link related resources appreciated.

@echo off setlocal set "sourcedir=u:\sourcedir" set "filename1=%sourcedir%\q40441783.txt" :: occurrence count keyword_1,keyword_2,keyword_3 %%k in (keyword_1 keyword_2 keyword_3) (  /f "delims=" %%c in ('type "%filename1%"^|find /c "%%k"') echo %%k : %%c times ) :: timestamp display keyword_1,keyword_2,keyword_3 %%k in (keyword_1 keyword_2 keyword_3) (  type "%filename1%"|find "%%k" ) :: remove variables starting $ or # %%b in ($ #)  /f "delims==" %%a in ('set %%b 2^>nul') set "%%a=" :: action - write(#) /f "usebackqtokens=1*delims=- " %%a in ("%filename1%") (  rem interested in "pattern - write(#) start/end"  /f "tokens=1-3*delims=-()" %%a in ("%%b") (   rem %%a action, %%b "write" %%c # %%d " start"/" end"   if "%%b"==" write" if "%%d"==" start" set "$%%c %%a$=%%a"   if "%%b"==" write" if "%%d"==" end" set "#%%c %%a#=%%a"   call :formatch&call :report "write(%%c)"  ) ) set $ 2>nul set # 2>nul :: remove variables starting $ or # %%b in ($ #)  /f "delims==" %%a in ('set %%b 2^>nul') set "%%a="  /f "usebackqtokens=1*delims=- " %%a in ("%filename1%") (  rem interested in "action_a/action_b elapsed time"  /f "tokens=1*delims=- " %%a in ("%%b") (   rem %%a action, %%b remainder of line   if "%%a"=="action_a" set "$1$=%%a"&set "_actiona=%%b"   if "%%a"=="action_b" set "#1#=%%a"   call :formatch   call :report "%%_actiona%% %%b"  ) ) set $ 2>nul set # 2>nul  goto :eof   :: see whether $something$ , #something# exist , report if :formatch set "elapsed=" /f "tokens=1,2delims=$=" %%m in ('set $ 2^>nul') (  if defined #%%m# (   call :elapsed %%n %%#%%m#%%   set "#%%m#="   set "$%%m$="  ) ) goto :eof  :report if defined elapsed echo %elapsed% %~1 goto :eof  :: %2 - %1 both in hh:mm.ss.ttt format :elapsed /f "tokens=1-4delims=:." %%w in ("%2") (set /a hh=2%%w&set /a mm=2%%x&set /a ss=2%%y&set /a ttt=2%%z) /f "tokens=1-4delims=:." %%w in ("%1") (set /a hh-=1%%w&set /a mm-=1%%x&set /a ss-=1%%y&set /a ttt-=1%%z) :: compensate "negatives" if %ttt% lss 1000 set/a ttt+=1000&set/a ss-=1 if %ss% lss 100 set/a ss+=60&set/a mm-=1 if %mm% lss 100 set/a mm+=60&set/a hh-=1 if %hh% lss 100 set/a hh+=24 set "elapsed=%hh:~-2%:%mm:~-2%:%ss:~-2%.%ttt:~-3%" goto :eof  goto :eof 

you need change setting of sourcedir suit circumstances.
used file named q40441783.txt containing data testing.

interesting exercise.

the first 2 steps obvious. included keyword_3 ensure correct report produced "not found". note state 2 occurrences keyword_1. actually, in posted data there 3.

the next step required explanation. first thing ensure there no variables starting # or $.

next- analyse each line, splitting first on first - or space , processing part beyond first delimiter-sequence tokenising on -() tokens described in rem statement. set variable $...$ or #...# time in %%a. ... here unique part of log entry - number , action. check whether there both $...$ , #...# same ... , if so, clear $#...$# variables, calculate elapsed time, reconstruct line , report.

the elapsed-time calculation prepends 2 start of each of variables ensure don't start 0 , potentially treated octal. pull same trick subtracting start time, using prepended 1 produce result should 3 digits (4 ms). if fewer digits detected, need add appropriate number , deduct 1 next-higher time element.

the processing action_a/_b timing same, records start/end times in $!$/#1# since there no indication of nature of strings action_a , action_b , we're forced assume appropriate events don't overlap.


Comments

Popular posts from this blog

java - SSE Emitter : Manage timeouts and complete() -

jquery - uncaught exception: DataTables Editor - remote hosting of code not allowed -

java - How to resolve error - package com.squareup.okhttp3 doesn't exist? -