I am a beginner C++ programmer.
I wrote a simple program that creates a char array (the size is user's choice) and reads what previous information was in it. Often you can find something that makes sense (I always find the alphabet?) but most of it is just strange characters. I made it output into a binary file.
However, How do I:
Recognize the different chunks of data
Recognize what chunks are what file format (i.e. what chunk is an image, audio, text, etc.)
My Code:
// main.cpp
#include <stdio.h>
#include <stdlib.h>
#include <iostream>
#include <string>
#include <fstream>
using namespace std;
int main() {
int memory_size = 4000;
string data = "";
bool inFile = false;
cout << "How many bytes do you want to retrieve? (-1 to exit)\n";
cin >> memory_size;
string y_n;
cout << "Would you like to write output into a file? (Y/N)\n";
cin >> y_n;
if (y_n.compare("Y") == 0 || y_n.compare("y") == 0)
inFile = true;
else
inFile = false;
char memory_chunk[memory_size];
for (int i=0;i<memory_size;i++) {
cout << memory_chunk[i] << "";
data += memory_chunk[i] + "";
}
if (inFile) {
ofstream file("output.binary", ios::out | ios::binary);
file.write(memory_chunk, sizeof memory_chunk);
file.close();
}
cin >> data;
return 0;
}
Example of the retrieved data: (This is A LOT smaller than what it usually can retrieve)
dû( L) àýtú( ¯1Œw ÐýDú( @ú( Lú( dû( ¼û( L) º
‰v8û( 7Œw û( ú( 0ý( k7Œwdû( @ 5 À ü( ¨›w ó˜wÞ¯ › Ø› 0ý( Hû( À › `› À Dû( LŒw › @› `› › lû( ÷Œw › › ˜› › û( 3YŒw › ~Œw › €› › à› Dü( › €› Dü( ßWŒwXŒwDÞ¯ › › €› ˆ› À › ¦› › !› : À › `› À ü( › ˆ› V €›
Œw ˆ› ¬û( Äÿ( ‘Q‡w€ôçþÿÿÿXŒwµTŒw ‚› xü( È6‹w › À×F fÍñt"ãŠvEA @ÒF ¸ü(
þÿÿÿ@ÒF Ã~“v Øü( O¯‰vØÞ¯øü( œ›‰v › ˆý( ‡ÌE @ÒF
8|“v ý( ‰v@M“v,ý( wî‰v hý( ¬_‘v8|“v˜_‘vݧY‘ ÀwF
<ý( Äÿ( e‹vàçþÿÿÿ˜_‘v"A
8|“v@ÒF ÀwF ïÀE ÕF ”› ÓºA ”› ÕF lF €F F 2 àýàý( ð @
Some file formats start with magic numbers that help to identify them, though this is not always the case. Wikipedia has some here: http://en.wikipedia.org/wiki/List_of_file_signatures. The unix command 'file' trys to guess file formats based on magic numbers in the data. The source code to that is most likely available somewhere. (apple darwin sources if nowhere else).