c# - processing huge utf8 files with splitting to multiple files -


i developing importer program importing large text utf8 (character bytes different) files in c#. if load 20gb file ram, solution not suitable , possible. it's better split file multiple smaller files process. now, problem splitting file foe processing. solution reading file line line , split them if lines number suitable number. think, not fast solution read file line line splitting. splitting time high. there algorithm splitting large utf8 files multiple files without reading line line , faster.

my suggestions problem below. thought keeping in mind of separation of concern, splitting of file , processing of file can separated better maintenance.

  1. read file in binary rather text
  2. do not read line line don't require reading file splitting.
  3. use seek. refer link.
  4. in case need save split-ted files complete lines, after seek position, search next end of line character , split file accordingly.
  5. once files split-ted, process files individually.

Comments

Popular posts from this blog

java - SSE Emitter : Manage timeouts and complete() -

jquery - uncaught exception: DataTables Editor - remote hosting of code not allowed -

java - How to resolve error - package com.squareup.okhttp3 doesn't exist? -