I'm trying to restore/rescue a database from some that I have:
I have all the recent files in PGDATA/base
(/var/lib/postgresql/9.6/main/base/
), but I have not the complete /var/lib/postgresql/9.6/main/
I have all files from an old backup (and not much different) dump that I restored in a new installation of PostgreSQL-9.6.
I have a lot of rescued files from the hard drive (from ddrescue
) and I got thousand of files without a name (having a "#" and then a number instead and in lost+found directory), so, for instance:
pg_class
filepg_clog
directory with 0000
fileEdit:
Probably I have the content of pg_xlog
, but I don't have the name of the files. I have 5 files sized 16777216 bytes:
#288294 (date 2019-04-01)
#288287 (date 2019-05-14)
#288293 (date 2019-07-02)
#261307 (date 2019-11-27)
#270185 (date 2020-01-28)
Also my old dump is from 2019-04-23, so the first one could be the same?
So my next step is going to try to read those files with pg_xlogdump
and/or trying to name them with those namefiles (beginning with
00000001000000000000000A
by date and put them to the new one pg_xlog
directory, that I saw that the system filenaming them, could be?). Also I realized that the last one has the date of the day hard drive crashed, so I have the last one.
The PGDATA/base
directory I rescued from the hard drive (damaged) contains directories 1
, 12406
, 12407
and 37972
with a lot of files inside. I check with pg_filedump -fi
that my updated data is stored on files in directory 37972
.
Same (but old) data is stored in files in directory PGDATA/base/16387
in the restored dump.
I tried directly to copy the files from one to other mixing the updated data over the old database but it doesn't work. After solved permission errors I can go in to the "Frankenstein" database in that way:
postgres@host:~$ postgres --single -P -D /var/lib/postgresql/9.6/main/ dbname
And I tried to do something, like reindex, and I get this error:
PostgreSQL stand-alone backend 9.6.16
backend> reindex system dbname;
ERROR: could not access status of transaction 136889
DETAIL: Could not read from file "pg_subtrans/0002" at offset 16384: Success.
CONTEXT: while checking uniqueness of tuple (1,7) in relation "pg_toast_2619"
STATEMENT: reindex system dbname;
Certainly pg_subtrans/0002
file is part of the "Frankenstein" and not the good one (because I didn't find it yet, not with that name), so I tried: to copied another one that seems similar first and then, to generate 8192 zeroes with dd
to that file, in both cases I get the same error (and in case that the file doesn't exist get the DETAIL: Could not open file "pg_subtrans/0002": No such file or directory.
). Anyway I have not idea that what should be that file. Do you think could I get that data from other file? Or could I find the missing file using some tool? So pg_filedump
show me empty for the other file in that directory pg_subtrans/0000
.
Extra note: I found this useful blog post that talk about restore from just rescued files using pg_filedump
, pg_class
's file, reindex system
and other tools and but is so hard for me to understand how to adapt it to my concrete and easier problem (I think that my problem is easier because I have a dump): https://www.commandprompt.com/blog/recovering_a_lost-and-found_database/
Finally we restored completely database based on PGDATA/base/37972
directory after 4 parts:
Checking and "greping" with pg_filedump -fi
which file correspond
to each table. To "greping" easier we made a script.
#!/bin/bash
for filename in ./*; do
echo "$filename"
pg_filedump -fi "$filename"|grep "$1"
done
NOTE: Only useful with small strings.
Executing the great tool pg_filedump -D
. -D
is a new option (from postgresql-filedump
version ≥10), to decode tuples using given comma separated list of types.
As we know types because we made the database, we "just" need to give a comma separated list of types related to the table. I wrote "just" because in some cases it could be a little bit complicated. One of our tables need this kind of command:
pg_filedump -D text,text,text,text,text,text,text,text,timestamp,text,text,text,text,int,text,text,int,text,int,text,text,text,text,text,text,text,text,text,text,int,int,int,int,int,int,int,int,text,int,int 38246 | grep COPY > restored_table1.txt
From pg_filedump -D
manual:
Supported types:
bigint
bigserial
bool
char
charN -- char(n)
date
float
float4
float8
int
json
macaddr
name
oid
real
serial
smallint
smallserial
text
time
timestamp
timetz
uuid
varchar
varcharN -- varchar(n)
xid
xml
~ -- ignores all attributes left in a tuple
All those text
for us were type character varying(255) but varcharN
didn't work for us, so after other tests we finally change it for text
.
timestamp
for us was type timestamp with time zone but timetz
didn't work for us, so after other tests we finally change it for timestamp
and we opted to lose the time zone data.
This changes work perfect for this table.
Other tables were much easier:
pg_filedump -D int,date,int,text 38183 | grep COPY > restored_table2.txt
As we get just "raw" data we have to re-format to CSV format. So we made a python program for format from pg_filedump -D
output to CSV.
We inserted each CSV to the PostgreSQL (after create each empty table again):
COPY scheme."table2"(id_comentari,id_observacio,text,data,id_usuari,text_old)
FROM '<path>/table2.csv' DELIMITER '|' CSV HEADER;
I hope this will help other people :)