How should I add a field to the metadata of Langchain's Documents?
For example, using the CharacterTextSplitter
gives a list of Documents:
const splitter = new CharacterTextSplitter({
separator: " ",
chunkSize: 7,
chunkOverlap: 3,
});
splitter.createDocuments([text]);
A document will have the following structure:
{
"pageContent": "blablabla",
"metadata": {
"name": "my-file.pdf",
"type": "application/pdf",
"size": 12012,
"lastModified": 1688375715518,
"loc": { "lines": { "from": 1, "to": 3 } }
}
}
And I want to add a field to the metadata
It isn't currently shown how to do this in the recommended text splitter documentation, but the 2nd argument of createDocuments can take an array of objects whose properties will be assigned into the metadata of every element of the returned documents array.
myMetaData = { url: "https://www.google.com" }
const documents = await splitter.createDocuments([text], [myMetaData],
{ chunkHeader, appendChunkOverlapHeader: true });
After this, documents
will contain an array, with each element being an object with pageContent
and metaData
properties. Under metaData
, the properties from myMetaData
above will also appear. pageContent
will also have the text of chunkHeader prepended.
{
pageContent: <chunkHeader plus the chunk>,
metadata: <all properties of myMetaData plus loc (text line numbers of chunk)>
}