Creating CLI tool in Python - part 3
In the previous post, I created the first command for an assistant called blog and added an argument title to it. Title argument is validated and
most of the business logic is covered with unit tests.
In this post, I'm going to create a new argument called image and validate it. The business logic will be covered with unit tests.
The second plan is to do a bit of refactoring and make a CLI tool more user-friendly by switching from arguments to options.
Overview
When I started to implement a image option logic I relized that I need to make changes in busines logic and add a project folder option also with a validation. So after this change the whole logic will look as following:
To start an application user will have to type on the command line following command:
1assistant blog --title "Title of the blog" --image "https://unsplash.com/photos/9FvZfRKKfH8"
As seen on the gif above, the project starts to look like a real tool.
The main change is that now application uses click option's - in my opinion, this makes interaction with a tool more human-readable.
Project structure
In structure I have made some minor changes - for better readability, I moved blog command logic into a separate module. I introduced new modules called file_handler and web_scraper.
1assistant/2|-- README.md3|-- assistant.py4|-- blog-command.py5|-- file-handler.py6|-- install-dev.sh7|-- logger.py8|-- setup.py9|-- str_helper.py10|-- test_str_helper.py11|-- test_validator.py12|-- test_web_scraper.py13|-- validator.py14|-- web_scraper.py
assistant.py
In a previous post, I implemented a blog command and title argument. At this point passing title using click argument seemed fine but when I added an image argument I realized the interaction with an application did not feel right. So I decided to change it and start using click options.
Here is an example of command before the change:
1assistant blog "Title of the blog" "https://unsplash.com/photos/9FvZfRKKfH8"
And this is after:
1assistant blog --title "Title of the blog" --image "https://unsplash.com/photos/9FvZfRKKfH8"
The same command can be written in a short version:
1assistant blog -t "Title of the blog" -i "https://unsplash.com/photos/9FvZfRKKfH8"
So let's check what has changed in an assistant.py file.
On project import lever there are some changes. I have moved a logger and validation module import under the blog_command module and now import this instead. Also, I need a str_helper, to check if a user has passed a project path or not.
1import click2import os3import blog_command4from str_helper import is_null_or_whitespace
Program entrypoint is still cli() function, no changes made here.
1@click.group()2@click.option(3 '-v',4 '--verbose',5 is_flag=True,6 help='Will print verbose messages about processes.'7)8@pass_config9def cli(config, verbose):10 config.verbose = verbose
Blog command business logic is now moved under a blog_command module.
But another thing to notice is that I have created two additional parameters - image and project-path.
The project-path is optional, but if not provided a current CLI location is used. For that, I have imported an os module and use the getcwd() method.
1@cli.command()2@click.option(3 '-t',4 '--title',5 required=True,6 type=str,7 help='The title of blog post.'8)9@click.option(10 '-i',11 '--image',12 required=True,13 type=str,14 help='The Unsplash image url.'15)16@click.option(17 '-p',18 '--project-path',19 required=False,20 type=click.Path(),21 help='The full path to project folder. Default: current working directory.',22 default=os.getcwd()23)24@pass_config25def blog(config, title, image, project_path):26 """Use this command to start a new blog post."""27 blog_command.handle(config, title, image, project_path)
blog_command.py
As visible below, the blog command is getting quite heavy, mainly it's because I have defined some additional information logging. The main idea in this command is to make a validation first and then get image data and download an image. Right now what annoys me is that the user does not have feedback on how much of an image is downloaded. For now, it's fine, but I will come up with something later when the "happy path" is implemented.
1from slugify import slugify2from os import path3import logger4import validator5import web_scraper6import file_handler78def handle(config, title, img_url, project_path):9 try:10 logger.info(config.verbose, 'Starting project path validation.')11 path_validation_result = validator.validate_project_path(project_path)12 logger.success(path_validation_result)1314 logger.info(config.verbose, 'Starting title validation.')15 title_validation_result = validator.validate_tile(title)16 logger.success(title_validation_result)1718 logger.info(config.verbose, 'Starting image url validation.')19 img_validation_result = validator.validate_img(img_url)20 logger.success(img_validation_result)2122 logger.info(config.verbose, 'Requesting image data.')23 file_name = '.'.join((slugify(title),'jpg'))24 image = web_scraper.get_image_author(img_url, file_name)25 logger.info(config.verbose, 'Image url: %s' % image.url)26 logger.info(config.verbose, 'Image file name: %s' % image.file_name)27 logger.info(config.verbose, 'Image author: %s' % image.author_name)28 logger.info(config.verbose, 'Image author profile: %s' % image.author_profile)29 logger.success('Successfully aquired image data.')3031 logger.info(config.verbose, 'Starting image download.')32 full_file_path = path.join(file_handler.find_sub_folder(project_path, '/src/images'), file_name)33 web_scraper.download_img(img_url, full_file_path)34 logger.success('Image "%s" downloaded succesfully to "%s".' % (img_url, full_file_path))3536 except ValueError as er:37 logger.error('Validation Error: {}'.format(er))38 except Exception as ex:39 logger.error(format(ex))
file_handler.py
The file_handler module is something I added to a project. This will contain all the files, directory related logic.
For now, there will be only one search method, maybe later I will add something else.
All this method does, is receiving a subdirectories of parent directory and then find a specific one.
This is needed because I have to make sure that the project folder contains an "images" folder. Since my blog project directory structure is nested, I decided that it's more comfortable to make it so that it's enough if the user is inside the project folder. Don't have to be navigated any other subdirectory.
1import os23def find_sub_folder(parent_path, sub_path):4 """Find a sub directory from parent folder.56 Returns:7 If the folder contains subdirectory then the full path to a subdirectory is returned.8 Else None is returned.9 """1011 folders = []1213 # r=root, d=directories, f = files14 for r, d, f in os.walk(parent_path):15 for folder in d:16 folders.append(os.path.join(r, folder))1718 result = [x for x in folders if sub_path in x]1920 if not result:21 return None2223 return result[0]
validator.py
This module now got the update, I added two additional validation methods, one for image and the other for project-path. The validate_img method is straight forward, basically, I had to make sure it's not empty if it is an exception is thrown. Then I had to make sure it's a valid URL - for this I used a regex. I did not want to add an Unsplash to regex, because maybe later there will be some other image provider or I will add images from some other place. So to support only Unsplash images, for now, I added if check.
1def validate_img(img):2 """Validate blog image.3 - required4 - starts with https5 - is an Unsplash link67 returns:8 Validation success message.9 """10 if is_null_or_whitespace(img):11 raise ValueError('Blog image is required, currently supporting only Unsplash.')1213 regex = re.compile(14 r'^(?:http|ftp)s?://' # http:// or https://15 r'(?:(?:[A-Z0-9](?:[A-Z0-9-]{0,61}[A-Z0-9])?\.)+(?:[A-Z]{2,6}\.?|[A-Z0-9-]{2,}\.?)|' #domain...16 r'\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})' # ...or ip17 r'(?::\d+)?' # optional port18 r'(?:/?|[/?]\S+)$', re.IGNORECASE)1920 result = re.match(regex, img)2122 if result is None :23 raise ValueError('Invalid blog image url.')2425 if "unsplash.com/photos/" not in img:26 raise ValueError('Invalid blog image url, currently supporting only Unsplash images.')2728 return 'Validation Success: Image "%s" is valid.' % img
The validate_project_path method just makes sure that project-path parameter is not empty and project folder contains an "images" folder.
1def validate_project_path(path):2 """Validate project path.3 -required4 -should contain a 'images' folder56 returns:7 Validation success message.8 """910 if is_null_or_whitespace(path):11 raise ValueError('Path to blog project is required.')1213 if not find_sub_folder(path, '/src/images'):14 raise ValueError('Blog project does not contain folder "images".')1516 return 'Validation Success: Project path "%s" is valid.' % path
web_scraper.py
To download images and get image information for Unsplash I'm using a urllib and BeautifulSoup library.
1from urllib import request, parse2from bs4 import BeautifulSoup
I have defined an Image class, to keep image data in one place after receiving it.
1class Image:23 def __init__(self, file_name, url, author_name, author_profile):4 self.file_name = file_name5 self.url = url6 self.author_name = author_name7 self.author_profile = author_profile
In download_img method I just combine a URL and download image to a provided path.
1def download_img(imageUrl, filePath):2 """Download image from Unsplash and save it in provided location"""34 downloadEndPoint = imageUrl + '/download?force=true'5 request.urlretrieve(downloadEndPoint, filePath)
In get_image_author I have defined a selector class, then making request and storing response in variable to decode it and parse it using BeatifulSoup. The rest is just to select a correct data from response.
1def get_image_author(imageUrl, file_name):2 """Request image author data from page.34 Returns Image object with filled data.5 """67 selector = '_3XzpS _1ByhS _4kjHg _1O9Y0 _3l__V _1CBrG xLon9'8 response = request.urlopen(imageUrl)910 if response.code != 200:11 raise Exception('Failed to make request to "%s"' % imageUrl)1213 data = response.read()14 html = data.decode('UTF-8')15 soup = BeautifulSoup(html, "html.parser")16 anchor = soup.find('a', class_=selector)17 username = anchor['href'].lstrip('/')18 author = anchor.contents[0]19 parsed_uri = parse.urlparse(imageUrl)20 author_profile = '{uri.scheme}://{uri.netloc}/'.format(uri=parsed_uri)21 image = Image(file_name, imageUrl, author, (author_profile + username))2223 return image
Summary
In this post, I created an additional two options/parameters to blog command - image and project-path. Created an image download logic and added more unit tests.
In the next post, I'm going to create a blog post starter file and fill it with some initial data.
Like always, the source code of this post is available in Github.