Delete Lines from Huge Files
18 Aug 2018 Linux tips editor vimWith huge files, tasks that as simple as removing first or last several lines could become hard. If you try to open a huge file with vim, the chance is that the system gets stuck struggling to load the huge file into memory. This post has several alternative approaches to delete lines from huge files.
Solution 1) Vim Ex Mode
vim +'$,$-1/.*/d' +'w sol2.txt' +'q!' toedit.txt
In the command line,
you can tell vim a series of commands to execute on the input file.
Note: those should be commands not shortcuts, for example, ‘dd’ to delete a line would never be recognized.
cat toedit.txt | vim -E +'$,$-1/.*/d' +%p -cq! /dev/stdin > sol1.txt
To avoid screen flashes when editing stream non-interactively, you need to start Vim in Ex mode by adding
-e
(Ex mode) or-E
(improved Ex mode) into your command-line arguments.+'cmd' / -c {cmd}
Invokes Ex command.
There might be advantages to use Ex mode, but both commands above can remove the last line in a huge file efficiently.
Solution 2) Stream Editor
sed -i '$ d' toedit.txt
This command removes the last line as well.
The argument -i
means the operation is taken “in place”.
sed '$,/pattern/d' toedit.txt > sol2.txt
sed is commonly used to filter text. So it supports more complicated operations, like the above example, delete the last line only when match the pattern.
Solution 3) Head And Tail
head -n -1 toedit.txt | tail -n +2 > sol3.txt
If the lines to remove resides at the beginning or the end of the files,
using head
and tail
is the simplest approach.
head
with negative value outputs the file except for the last several lines.
tail
with value prefixed with ‘+’ outputs the file from the specific line from the beginning.
Credits
- How can you use vim as a stream editor? - Thank kenorb’s answer regarding vim Ex mode.
- GNU sed - Stream Editor.
- Remove the last line from a file in Bash - Thank thkala’s answer, which leads me to solution 2.
- Remove First n Lines of a Large Text File - Thank Binyamin and steeldriver for simple approach with
head
andtail
.