Tutorials

python string data type

Python String Data Type

In the previous article, we have discussed about the Python variables including string variables. String is a Python built-in data type which holds a sequence of characters, you will need to use it whenever you need to do any text processing. In this article, I will be sharing with you the various operations you can perform with the Python string data type.

Python string data type

In python, you can define a string variable with single quote, double quotes or triple quotes. And use type() function to verify the data type of your variable. E.g.:

text1 = 'hello \n world!'
text2 = "bac;def,what$ is"
text3 = """this is also fine"""
print(type(text1), text1)
print(type(text2), text2)
print(type(text3), text3)

You should be able to see the below output, and the data type is showing as “str”.

<class 'str'> hello 
 world!
<class 'str'> bac;def,what$ is
<class 'str'> this is also fine
Slice Operation

As per the definition for Python string data type, it is a sequence of characters, which means you can access each of the character with the index. (index starts from 0 for the first element)

print(text1[0], text2[1], text3[2])
h a i

And you can use slice operation to get a sub set of your string variable:

#get a sub string starting from index 0 and ending at index 5 (exclusive)
print(text1[0:5])
#get a sub string starting from index 5 and ending at index 7 (exclusive)
print(text3[5:7])
#get a sub string starting from default index 0 and ending at index 4 (exclusive)
print(text3[:4])
#get a sub string starting from index 5 and ending at the end of the string
print(text3[5:])
hello
is
this
is also fine

You can also specify the negative index value to slice the string starting from right to left:

print(text1[-1])
print(text3[-3:-1])
!
in

There is actually a third option – slice step you can use, which you can specify a non-zero integer, e.g:

print(text4[0::2])
print(text4[1::2])
aceg
bdf
Immutable nature

Since we are able to get each individual character from a string, you may wonder if we can re-assign something else to a particular position in the string. e.g.:

text4[0] = 'T'
#TypeError: 'str' object does not support item assignment

The error shows up because string is immutable and you cannot change anything in it’s original content unless you create a new string:

new_text4 = "T" + text4[1:]
+ and *

And you may noticed different strings can be concatenated by using the “+” in the above example. There is also more operator * can be used in the string.

print(text4 + text3*2)

This will duplicate text3 twice and concatenate them into a single string:

abcdefgthis is also finethis is also fine
Formatting Python string data type

Below are some of the string formatting functions, it’s quite self-explanatory by the function name:

print("lower:", text4.lower())
#same as lower()
print("casefold:", text4.casefold())

print("upper:", text4.upper())

print("title:", text4.title())
#same as title
print("capitalize:", text4.capitalize())

print("swapcase:", text4.swapcase())
print("center:", text4.center(40, "*"))
print("ljust:", text4.ljust(40))
print("rjust:", text4.rjust(40, "*"))
print("zfill:", text4.zfill(40))
print("strip:", text4.strip("a"))
print("replace:", text4.replace("a", "A"))

Below is the output:

lower: abcdefg
casefold: abcdefg
upper: ABCDEFG
title: Abcdefg
capitalize: Abcdefg
swapcase: ABCDEFG
center: ****************abcdefg*****************
ljust: abcdefg                                 
rjust: *********************************abcdefg
zfill: 000000000000000000000000000000000abcdefg
strip: bcdefg
replace: Abcdefg

And also there are functions you can use for checking the string format:

print("isalnum:",text4.isalnum())	
print("isalpha:",text4.isalpha())
print("isdecimal:",text4.isdecimal())
print("isdigit:",text4.isdigit())
print("isnumeric:",text4.isnumeric())
print("isidentifier:",text4.isidentifier())
print("islower:",text4.islower())
print("istitle:",text4.istitle())
print("isupper:",text4.isupper())
print("isspace:",text4.isspace())
print("isprintable:",text4.isprintable())

Output will be something similar to below:

isalnum: True
isalpha: True
isdecimal: False
isdigit: False
isnumeric: False
isidentifier: True
islower: True
istitle: False
isupper: False
isspace: False
isprintable: True
Comparison operations

You can use relational operators such as ==, >, < to compare the two strings. Python will try to compare letter by letter, and all the uppercase letters come before lowercase, hence you will need to convert your texts into a standard format e.g. all upper or lower case, in order to get the comparison result in alphabetical order.

To check if the string starts/ends with any characters, you can use the startswith and endswith function:

if text3.startswith("this"):
    print("yes, it starts with 'this'")
if text3.endswith("fine"):
    print("yes, it ends with 'fine'")

There is no function called contains (sometime people get confused since Java string has this contains method), but you can use the below function – in, find, index or rindex to check if the string has any sub string:

if "this" in text3:
    print("'this' is in text3")
else:
    print("not found")

if text3.find("this") > -1:
    print("found 'this' from tex3")
else:
    print("not found")

if text3.find("this",1, 20) > -1:
    print("found 'this' from tex3")
else:
    print("'this' is not found from text3, starting from index 1 to 20 ")

if text3.index("this") >-1:
    print("found 'this' from tex3, index >=0")
else:
    print("not found")

#ValueError: substring not found
#idx = text3.index("this",1, 20)

Both find and index function return the index value of the sub string, the difference between of two function is that, index function will raise ValueError when the sub string is not found, while find will just return -1.

Split & Join texts

A lot times you may need to split the text by certain delimiter, e.g. newlines (\n), ; space etc. You can use the split function to the text into a list. If the delimiter is not found, the split function will return the original text as in a list.

print("split by default deliminator:", text3.split())
print("split by s", text3.split('s'))
print("split by ;", text3.split(';'))

The output will be:

split by default deliminator: ['this', 'is', 'also', 'fine']
split by s ['thi', ' i', ' al', 'o fine']
split by ; ['this is also fine']

On the other hand, if you have a list of string, you would like to join them into one string, you can do the following:

print("join the words with ';':", ';'.join(text3.split()))
print("join the words without space:", ''.join(text3.split()))

And below is the output:

join the words with ';': this;is;also;fine
join the words without space: thisisalsofine
Count occurrence

The count function can be used for calculating the occurrence of a sub string from the original string, for instance :

print(text3*5)
print("'is' occurence:',(text3*5).count("is"))

Result will be :

this is also finethis is also finethis is also finethis is also finethis is also fine
'is' occurence:10

Conclusion

With all the above examples provided, we have covered most of the commonly used functions for Python string data type. You may also check through the Python official document to see if there is any additional functions you are interested to know for the Python strings data type.

Python Variables and Keywords

Python Tutorial – Variables and Keywords

This article serves as a tutorial for Python beginners to gain the essential knowledge to start coding in Python. By complete this tutorial, you shall be able to know how to correctly use Python variables as well as the Python keywords.

Python Variable

Variable is a name that refers to some value. Like any other programming languages, Python allows to define variables and manipulate it in your code logic.

Name convention

Python allows to use letter, number, or underscore [_] in a variable name, but it has to start with a letter or an underscore (_) followed by zero or more letters, underscores and digits (0 to 9).

There is no limit on the length of your variable name, so you can choose anything meaningful to you in your code. but Python provided some guidelines to use lowercase as much as possible for the variables and function name.

Below are some examples of valid variable names:

a = "a"
#Python variable name is case sensitive
A = "a"
 
module_name = "Python Tutorial for Variables & Keywords"
speed_of_gravity = 299792458
pi = 3.14159265359
is_matched = True

And some invalid variable names as per below, if you use them in your code, Python throws “SyntaxError: invalid syntax” error.

1st_name = "John"
#invalid as variable cannot start with digits
first name = "John"
right/wrong = True
#invalid as variable cannot has special characters like /, whitespace, @, &, * etc., except _

Use of underscore

Take note of the _, although it is allowed to use in your variable name, it has some special meaning if you use it at the beginning. e.g. if you use _salary in your class, Python will protect it from accessing from outside of the class. This is out of scope for this topic, but do bear in mind on this.

Also if you use _ as your available name, there will be a conflict in the Python interactive mode, as in interactive mode, _ is interpreted as the result of the last executed expression, check more from this article.

You may also noticed that variables can hold different sorts of values, e.g. single character, multiple characters, numbers, and True or False etc. This is the different data type in Python, we will come to this topic in the later article.

Reserved Keywords

There are some other words we cannot directly use as variable, these words are so called Python reserved keywords, as Python uses these words to recognize the structure of the program.

Below are all the keywords reserved by Python3, and it is not allowed to use them directly as variable name.

False      await      else       import     pass
None       break      except     in         raise
True       class      finally    is         return
and        continue   for        lambda     try
as         def        from       nonlocal   while
assert     del        global     not        with
async      elif       if         or         yield

For Python beginners, if you use some IDE like PyCharm or Jupyter Notebook, these keywords will be automatically highlighted in different color, so you don’t worry about you mistakenly used them as variable name.

Python variables and keywords

Besides these reserved keywords, there are a few more words you shall try to avoid using them when defining your variable. For instance the below:

str
int
float
list
dict
set
tuple
bytes

These are the Python built-in data types which will be covered in the next tutorial. And there won’t be any error prompted immediately when you assign a value to them, but you will face some issues when you want to call the default behavior of the built-in data type later. Below is an example:

python built-in data type

The str() will throw error if you assigned “Test” to it, and it only works again if you delete the “str” as a variable. Hence the best practice is not to use these words as variable name in your code to prevent some unexpected errors and confusions.

 

Python tuple

Python built-in types – Tuples

Tuple is a python built-in data structure which holds a sequence of values, and the values can be in any data type. If you write a hundred lines of python code, it is almost impossible to avoid it in your code, as it comes in implicitly or explicitly from your variable assignment and iteration to return values of your method. In this article, I will be sharing with you where and how the tuples will be possibly used in your code.

Variable assignment with Tuple

You may have written the code in the below way to assign the values to variables in one line. The left side is the tuple of variables, and the right side is the tuple of values/expressions.

sort_by_name, sort_by_date = True, False
#output : sort_by_name True, sort_by_date False
key, val = "20200601" , "Mon"
#output : key '20200601', val 'Mon'

Sometimes if you want to swap the values of two variables, you do not need to create a temp variable for swapping. The below will do a perfect job to swap the value for key, val variables.

key, val = val, key
#output: key 'Mon', val '20200601'

Traverse the elements of a sequence

If you want to iterate through each of the elements in a sequence and meanwhile get the index of the element, you can do it by below code:

for idx, label in enumerate('ABCDEFG'):
    print(idx, label)

#output: 0, A
#1, B
#...

When iterating a dictionary, the iterms method returns a list of tuples, and each tuple is the key and value pair, e.g.:

company_info = {"name" : "Alibaba", "headquarter" : "Hangzhou, China", "founded" : "4 April 1999"}
for key, val in company_info.items():
    print(f"{key} : {val}")

If you have checked my another post – How to swap the key and value in a python dictionary, it is just an extension to the above.

Iterate multiple sequences at one time with zip

If you use the built-in zip function to iterate multiple sequences at one time, it actually returns an iterator of tuples. See the below example:

names = ["Alibaba", "Amazon", "Google"]
countries = ["China", "USA", "USA"]
years = ["1999", "1996", "1998"]
for rec in zip(names, countries, years):
    print(rec)

#output:
#('Alibaba', 'China', '1999')
#('Amazon', 'USA', '1996')
#('Google', 'USA', '1998')

Return multiple values from function

Normally a function can only returns 1 value, but with tuple, you can return multiple values even in different data types. (technically speaking, it is still 1 value but tuple type)

e.g. The python built-in method divmod:

quotient, remainder = divmod(10, 3)
print(quotient, remainder)
#output: 3 1

You can also define your own function to return multiple values like below:

def split_email(email):
    user_name, company_site = email.split("@")
    return user_name, company_site


split_email("contact@codeforests.com")
#output: ('contact', 'codeforests.com')

With this example, I am going to wrap up my article for this topic. If you have any questions or comments, please share in the below.

 

python regular expression match, search and findall

Python regular expression match, search and findall

Python beginners may sometimes get confused by this match and search functions in the regular expression module, since they are accepting the same parameters and return the same result in most of the simple use cases.  In this article, let’s discuss about the difference between these two functions.

match vs search in Python regular expression

Let’s start from an example. Let’s say if we want to get the words which ending with “ese” in the languages, both of the below match and search return the same result in match objects.

import re
languages = "Japanese,English"
m = re.match("\w+(?=ese)",languages)
#m returns : <re.Match object; span=(0, 5), match='Japan'>

m = re.search("\w+(?=ese)",languages)
#m returns : <re.Match object; span=(0, 5), match='Japan'>

But if the sequence of your languages changed, e.g. languages = “English, Japanese”, then you will see some different results:

languages = "English,Japanese" 
m = re.match("\w+(?=ese)",languages) 
#m returns empty
m = re.search("\w+(?=ese)",languages) 
#m returns : <re.Match object; span=(8, 13), match='Japan'>

The reason is that match function only starts the matching from the beginning of your string, while search function will start matching from anywhere in your string. Hence if the pattern you want to match may not start from the beginning, you shall always use search function.

In this case, if you want to restrict the matching only start from the beginning, you can also achieve it with search function by specifying “^” in your pattern:

languages = "English,Japanese,Chinese" 
m = re.search("^\w+(?=ese)",languages) 
#m returns empty
m = re.search("\w+(?=ese)",languages)
#m returns: <re.Match object; span=(8, 13), match='Japan'>

findall in Python regular expression

You may also notice when there are multiple occurrences of the pattern, search function only returns the first matched. This sometimes may not be desired when you actually want to see the full list of matched patterns. To return all the occurrences, you can use the findall function:

languages = "English,Japanese,Chinese,Burmese"
m = re.findall("\w+(?=ese)", languages)
#m returns: ['Japan', 'Chin', 'Burm']

 

 

python read and write json file

Read and write json file in python

Json file format is commonly used in most of the programming languages to store data or exchange the data between back end and front end, or between different applications and systems. In this article, I will be explaining how to read and write json file in python programming language.

Read from a JSON file

Python has a json module which makes the read and write json pretty easy. First, let’s assume we have the below example.json file to be read.

{
"link": "www.codeforests.com",
"name": "ken", 
"member": true, 
"hobbies": ["jogging", "watching movie"]
}

To read the file, we can simply use the load method and pass in the file descriptor.

example = json.load(open("example.json"))

Now you can access the example dictionary for the data, e.g.

print(config["hobbies"])

The output would be :

['jogging', 'watching movie']

Write into JSON file

Let’s continue to use the previous example, and try to add one more hobby into the hobbies. Then save the json object into a file.

This time, you can use the json.dump and pass in the file descriptor to be written to:

example["hobbies"].append("badminton")
with open("example.json", "w") as f:
    json.dump(example, f)

If you look at the json documentation, there are two more methods : json.loads and json.dumps. The main difference of this two methods vs json.load & json.dumps is that the loads and dumps take the str representation of the json object. e.g.:

obj = json.loads('{"json":"obj"}')
print(obj)
print(json.dumps({"json":"obj"}))