EDIT: The solution to this problem was supplied by Ulrich Eckhardt in the comments below. Also: this problem had an entirely different cause and solution from the ones described in possible duplicates. Again, see Ulrich Eckhardt's comment for details.
With the help of the experts here, I managed to put together a program that writes the contents of the Windows clipboard to a text file in a specified code page. It now seems to work perfectly, except that the line breaks in the text file are three bytes - 0d 0d 0a - instead of 0d 0a - and this causes problems (additional lines) when I import the text into a word processor.
Is there an easy way to replace 0d 0d 0a with 0d 0a in the text stream, or is there something I should be doing differently in my code? I haven't found anything like this elsewhere. Here is the code:
#include <stdafx.h>
#include <windows.h>
#include <iostream>
#include <fstream>
#include <codecvt> // for wstring_convert
#include <locale> // for codecvt_byname
using namespace std;
void BailOut(char *msg)
{
fprintf(stderr, "Exiting: %s\n", msg);
exit(1);
}
string ExePath()
{
char buffer[MAX_PATH];
GetModuleFileNameA(NULL, buffer, MAX_PATH);
string::size_type pos = string(buffer).find_last_of("\\/");
return string(buffer).substr(0, pos);
}
// get output code page from command-line argument; use 1252 by default
int main(int argc, char *argv[])
{
string codepage = ".1252";
if (argc > 1) {
string cpnum = argv[1];
codepage = "." + cpnum;
}
// HANDLE clip;
string clip_text = "";
// exit if clipboard not available
if (!OpenClipboard(NULL))
{ BailOut("Can't open clipboard"); }
if (IsClipboardFormatAvailable(CF_TEXT)) {
HGLOBAL hglb = GetClipboardData(CF_TEXT);
if (hglb != NULL) {
LPSTR lptstr = (LPSTR)GlobalLock(hglb);
if (lptstr != NULL) {
// read the contents of lptstr which just a pointer to the string:
clip_text = (char *)hglb;
// release the lock after you're done:
GlobalUnlock(hglb);
}
}
}
CloseClipboard();
// create conversion routines
typedef std::codecvt_byname<wchar_t, char, std::mbstate_t> codecvt;
std::wstring_convert<codecvt> cp1252(new codecvt(".1252"));
std::wstring_convert<codecvt> outpage(new codecvt(codepage));
std::string OutFile = ExePath() + "\\#clip.txt"; // output file name
ofstream OutStream; // open an output stream
OutStream.open(OutFile, ios::out | ios::trunc);
// make sure file is successfully opened
if (!OutStream) {
cout << "Error opening file " << OutFile << " for writing.\n";
return 1;
}
// convert to DOS/Win codepage number in "outpage"
OutStream << outpage.to_bytes(cp1252.from_bytes(clip_text)).c_str();
//OutStream << endl;
OutStream.close(); // close output stream
return 0;
}
The comments here are on the right track, but let me provide more context and point out a lingering problem.
There are various line-terminator/separator conventions. Many Unix-derived systems use a line feed character at the end of every line. In ASCII, that's '\x0A'
. Other systems, like Windows and many networking protocols, use a carriage return followed by a line-feed between lines. In ASCII, that's '\x0D' '\x0A'
. (There are other schemes as well, but they are much rarer.)
The C and C++ input/output libraries for reading and writing text can hide these conventions from you so that you can right code one way that does the "right thing" on whatever the underlying platform is.
The programming convention is to use '\n'
, which is almost certainly equivalent to a line feed if your underlying platform uses ASCII or Unicode (but not if it uses EBCDIC, which doesn't have a line feed character). When writing to a file, the library will intercept the '\n'
and put whatever convention your platform requires. For example, if you're on a Linux machine, it'll output a line feed (and since '\n'
has the same value as a line feed, this is basically a no-op). On Windows, the library will intercept the '\n'
and output a carriage return and a line feed. The input side of things does the opposite.
When you get text from the clipboard on Windows, you don't really know which convention it uses. Since it's Windows, you'd probably expect CR+LF, but lots of programs that might put text on the clipboard might not behave properly on Windows.
In your case, it seems the text from the clipboard does indeed have both a carriage return and a line feed between lines. When you then output that in text mode, the i/o library outputs the carriage return, and then it sees the line feed (which it thinks is a '\n'
), and so it outputs another carriage return followed by a line feed. That's why you see a doubling of the carriage returns.
Switching the output to binary mode tells the library "don't convert '\n'
." So, that solves your immediate problem.
But there's still the problem that the clipboard text might sometimes have just line feeds between (or at the ends of) lines. If you output that in binary mode, you won't get the carriage returns, and the file technically won't be in the format your platform wants. Some programs will cope with this, but others, e.g., Notepad, will not.