c# - processing huge utf8 files with splitting to multiple files -
i developing importer program importing large text utf8 (character bytes different) files in c#. if load 20gb file ram, solution not suitable , possible. it's better split file multiple smaller files process. now, problem splitting file foe processing. solution reading file line line , split them if lines number suitable number. think, not fast solution read file line line splitting. splitting time high. there algorithm splitting large utf8 files multiple files without reading line line , faster.
my suggestions problem below. thought keeping in mind of separation of concern, splitting of file , processing of file can separated better maintenance.
- read file in binary rather text
- do not read line line don't require reading file splitting.
- use seek. refer link.
- in case need save split-ted files complete lines, after seek position, search next end of line character , split file accordingly.
- once files split-ted, process files individually.
Comments
Post a Comment