Search code examples
pythongithub-apipygithub

GitHub file search by name results differ from web when using API or pyGitHub


I have a large git repo with multiple .Net solutions in it (in the example repo there is only one). I want to return the paths of all the Directory.Packages.props files within that specific repo. I am trying to use the code below:

from github import Github

token = '<my token>'

# Create a GitHub instance using the token
g = Github(token)

# Search for the file
result = g.search_code('Xcaciv/Xcaciv.Command', qualifiers={'path': '**/Directory.Packages.props'}) 

# Print the file content
print(result.totalCount)

I have tried several combinations for the parameters to search_code() without success. I feel like I am missing something simple but cannot see it. I have tried putting repo:Xcaciv/Xcacive.Command for the first parameter. I have tried to just use the string repo:Xcaciv/Xcaciv.Command path:**/Directory.Packages.props with no success. I have also tried calling the API directly in this same manner without success.

Each time I try I get HTTP 200 with a result of 0 (no results).

I appreciate any guidance.


Solution

  • you can use this

    for all your repo:

    from github import Github, RateLimitExceededException
    import time
    
    
    token = 'XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX'
    
    g = Github(token)
    
    def search_files(query):
        try:
            result = g.search_code(query)
            return result
        except RateLimitExceededException:
            print("Rate limit exceeded, waiting for reset...")
            core_rate_limit = g.get_rate_limit().core
            reset_timestamp = core_rate_limit.reset.timestamp()
            current_timestamp = time.time()
            wait_time = reset_timestamp - current_timestamp
            time.sleep(wait_time + 1)  
            return search_files(query) 
    
    
    query = 'filename:Directory.Packages.props'
    result = search_files(query)
    
    
    print(f'Total files found: {result.totalCount}')
    for file in result:
        print(f"File path: {file.path}")
        print(f"Repository: {file.repository.full_name}")
        print(f"URL: {file.html_url}")
        print("-" * 40)
    
    

    as github has api has ratelimit ,we catch the rate limit error and wait until the limit resets before continuing

    for single repository:

    from github import Github, RateLimitExceededException
    import time
    
    token = 'XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX'
    
    
    g = Github(token, timeout=10)  
    
    def search_files(query):
        try:
            print(f"Searching with query: {query}")
            result = g.search_code(query)
            print("Search completed.")
            return result
        except RateLimitExceededException:
            print("Rate limit exceeded, waiting for reset...")
            core_rate_limit = g.get_rate_limit().core
            reset_timestamp = core_rate_limit.reset.timestamp()
            current_timestamp = time.time()
            wait_time = reset_timestamp - current_timestamp
            print(f"Waiting for {wait_time + 1} seconds for rate limit reset...")
            time.sleep(wait_time + 1)
            return search_files(query)
        except Exception as e:
            print(f"An error occurred: {e}")
            return None
    
    # Specify the repository name (owner/repo)
    repo_name = 'owner/repo'
    
    query = f'filename:Directory.Packages.props repo:{repo_name}'
    
    result = search_files(query)
    
    if result:
        print(f'Total files found: {result.totalCount}')
        for file in result:
            print(f"File path: {file.path}")
            print(f"Repository: {file.repository.full_name}")
            print(f"URL: {file.html_url}")
            print("-" * 40)
    else:
        print("No results found or an error occurred.")
    
    
    rate_limit = g.get_rate_limit()
    print(f"Core rate limit remaining: {rate_limit.core.remaining}")
    
    

    Note: When using the GitHub search API, the 'filename:' qualifier is used to find files with a specific name, regardless of their directory or path. The API does not support a 'path:' qualifier for searching within specific directories.Instead, if you want to search for files within a specific directory, include the directory path in your query. For example, searching for files under a 'config' directory would look like 'config/Directory.Packages.props'. In this case, we use 'filename:' to ensure the search works as expected.