how to consider 'punctuation ' in CountVectorizer?

I am using CountVectorizer of Sklearn to convert my strings into a vector. However, CountVectorizer by default select tokens of 2 of more characters and also ignore the punctuation and considered them as a separator. I want to consider even one character as a token and also include punctuation. For example:

aaa 1 2.75 zzz
aaa 2 3.75 www

I want a matrix of

1 1 1 0 1 1 0 
1 0 1 1 0 0 1

Is there a simple way to achieve this goal?

Solution

You can use a custom tokenizer as in this example:

import re

new_docs=["aaa 1 2.75 zzz","aaa 2 3.75 www"]

def my_tokenizer(text):
    return re.split("\\s+",text)


cv = CountVectorizer(new_docs,tokenizer=my_tokenizer)
count_vector=cv.fit_transform(new_docs)
print(cv.vocabulary_)

Example output:

{'aaa': 4, '1': 0, '2.75': 2, 'zzz': 6, '2': 1, '3.75': 3, 'www': 5}

See more CountVectorizer usage examples here.

I want to install the "n" package and I get an error
n <version> command does not activate specified version
Change n install location
How to install a specific version of Node on Ubuntu/Debian?
Different node version for different projects, is there a way of telling node which version to use?
Install Node.js to install n to install Node.js?
How to select the latest node.js v6 version using n?
n-install: ERROR: GNU Make not found, which is required for operation
How to downgrade Node version with n
how switch to previous version in n (Node version manager)?
Automatically use the right version of Node for a package
internal/modules/cjs/loader.js:905 -> throw err;
Why doesn't "n" downgrade my node version on a Mac?
Node version manager
n failed to install/switch node in Linux?
vue command not found on Mac
How to uninstall n and all node versions installed by n
Angular CLI on HTTPS - can't install CI as root
n (node version manager): cannot create directory
npm module n emits errors
How to update npm permanently?
Cannot change nodejs version using n
upgrade nodejs to stable version
How should I install and use multiple versions of Node on the same production machine?