Tutorials

Python interact with smart contract, python brownie, python mint NFT, python call smart contract function

Python Interact with Smart Contract

In the previous post, we have discussed how to deploy a NFT smart contract with Python Brownie. Once your contract is deployed into the blockchain, you can interact with your live contract either by building up your own Dapp or using scripts to perform some repetitive tasks like batch token transferring or airdrop. In this article, we will share with you some popular approaches to interact with smart contract in Python once the contract is live in public network.

Prerequisite

If you do not have Brownie installed, you shall go ahead to run the below pip command in your own Python virtual environment to install it:

#run in your venv
pip install eth-brownie

Brownie is heavily relying on Web3.py and use ganache cli as the default emulator for the development network, so you will get these dependencies installed automatically.

And assuming you have already compiled your smart contract and run the deployment script without any issue, if not, you may check through this post before proceeding. Now let’s explore how to interact with a deployed contract.

Work with Contract in Ganache Local Network

Using Ganache CLI

The Ganache CLI is used by default for development, and the local network is spin up when you are running your deployment script and terminated as soon as the script finished execution. In this case, the deployed contract is not stored anywhere permanently and you can only interact with it within same session before Ganache is shut down. The deployed contract is added into Brownie’s deployment map automatically, so you do not have to note down contract address when you want to refer to it. For instance, you can get your last deployed contract as per below:

# assuming a smart contract LegendNFT has been deployed
last_deployed_contract = LegendNFT[-1]

I would recommend you to use Brownie console when you just get started. It’s a Python interactive mode with all Brownie stuff loaded, and you can immediately get feedbacks if something goes wrong in your code. When everything is tested ok in the console, you can then copy your code to the Python script and make it re-usable for later use.

To start Brownie console for your local network, you can use the below command:

brownie console

And from the console, you can also deploy your contract:

>>> contract = LegendNFT.deploy({'from' : accounts[0]})

Transaction sent: 0x38643bde3b513dbaedf887c60162e5bb915b32bde2a804172dfaa2520b4d4ac8
Gas price: 0.0 gwei Gas limit: 12000000 Nonce: 0
LegendNFT.constructor confirmed Block: 1 Gas used: 3188048 (26.57%)
LegendNFT deployed at: 0x3194cBDC3dbcd3E11a07892e7bA5c3394048Cc87

Then you can verify that the contract address returned from deploy function is the same as the last record stored in Brownie’s deployment map:

>>> LegendNFT[-1]
<LegendNFT Contract '0x3194cBDC3dbcd3E11a07892e7bA5c3394048Cc87'>
>>> contract == LegendNFT[-1]
True

In this case, you can always use LegendNFT[-1] to get the last deployed contract instead of replying on the contract address returned from your deploy function, and the deployment script can be kept separately so that you don’t accidently redeploying the contract when you want to run your other functions.

One advantage of using console is that you can use dir function to inspect what are the available functions in the contract instance, e.g.:

>>> dir(LegendNFT[-1])
[abi, address, approve, balance, balanceOf, bytecode, decode_input, getApproved, 
getTotalNFTsMintedSoFar, get_method, get_method_object, info, isApprovedForAll, 
makeLegendNFT, name, ownerOf, safeTransferFrom, selectors, setApprovalForAll, 
signatures, supportsInterface, symbol, tokenURI, topics, totalSupply, transferFrom, tx]

Some of these functions are not explicated implemented in the contract but inherited from other contracts, you will be able to call them as long as the functions have public or external visibility specifier.

Now you can call whatever functions available in your contract, for instance:

>>> contract.name()
'Codeforests Legend'
>>> contract.totalSupply()
1000000

>>> contract.getTotalNFTsMintedSoFar()
0

And you can also call your mint function which will change the state of the blockchain:

>>> contract.makeLegendNFT({'from':accounts[0]})

Transaction sent: 0x6dabe95e4c54d43864e91bfeb398c2bfb0549bf48d51db2cd42b9882e501a905
  Gas price: 0.0 gwei   Gas limit: 12000000   Nonce: 1
  LegendNFT.makeLegendNFT confirmed   Block: 2   Gas used: 777946 (6.48%)

<Transaction '0x6dabe95e4c54d43864e91bfeb398c2bfb0549bf48d51db2cd42b9882e501a905'>

Of course you would still not be able to see how it looks like for your newly minted NFT since the minting was only happening in your local network. If you use the tokenURI function and pass in the first token ID, you shall see some base64 encoded data similar to below:

>>> contract.tokenURI(0)

'data:application/json;base64,eyJuYW1lIjogIlNwbGVuZGlkIEMjIFByb2dyYW0iLCAiZGVzY3JpcHRpb24iOiAiQSBsZWdlbmQgTkZUIGNvbGxlY3Rpb24gZm9yIGFsbCBjb2RlZm9yZXN0cyByZWFkZXJzLiIsICJpbWFnZSI6ICJkYXRhOmltYWdlL3N2Zyt4bWw7YmFzZTY0LFBITjJaeUI0Yld4dWN6MG5hSFIwY0RvdkwzZDNkeTUzTXk1dmNtY3ZNakF3TUM5emRtY25JSEJ5WlhObGNuWmxRWE53WldOMFVtRjBhVzg5SjNoTmFXNVpUV2x1SUcxbFpYUW5JSFpwWlhkQ2IzZzlKekFnTUNBek5UQWdNelV3Sno0OGMzUjViR1UrTG1KaGMyVWdleUJtYVd4c09pQWpSa1ZHUlVaRk95Qm1iMjUwTFdaaGJXbHNlVG9nVEdWaFozVmxJRk53WVhKMFlXNDdJR1p2Ym5RdGMybDZaVG9nTWpSd2VEc2dmVHd2YzNSNWJHVStQSEpsWTNRZ2QybGtkR2c5SnpFd01DVW5JR2hsYVdkb2REMG5NVEF3SlNjZ1ptbHNiRDBuSXpBMU1UWXlNaWNnTHo0OGRHVjRkQ0I0UFNjMU1DVW5JSGs5SnpVd0pTY2dZMnhoYzNNOUoySmhjMlVuSUdSdmJXbHVZVzUwTFdKaGMyVnNhVzVsUFNkdGFXUmtiR1VuSUhSbGVIUXRZVzVqYUc5eVBTZHRhV1JrYkdVblBsTndiR1Z1Wkdsa0lFTWpJRkJ5YjJkeVlXMDhMM1JsZUhRK1BDOXpkbWMrIn0='

This is the URI info for your NFT, and you can use base64 decoder to decode the data to see more details, or use this website to preview the NFT content.

Using Ganache UI

If you are using Ganache desktop version, you can add a new network in Brownie and let Brownie to connect to your Ganache RPC client. Below is the command to add a new network:

brownie networks add Development ganache-local host=http://127.0.0.1 network_id=5777 port=7545 cmd=ganache-cli

The network ID/host/port can be found from your Ganache client as per below:

Python Interact with Smart Contract, Python mint NFT, Python Brownie, Ganache client

Once you have added the network, you can start your Brownie console as per below:

brownie console --network ganache-local

#Attached to local RPC client listening at '127.0.0.1:7545'...

And then deploy your contract:

>>> LegendNFT.deploy({'from' : accounts[0]})

You shall see some transactions went through if you check the Ganache client:

Python Interact with Smart Contract, Python mint NFT, Python Brownie, Ganache client

So you will be able to trace the transaction history from Ganache client, and once you restart it, those history data will be gone and you will get a fresh new blockchain environment same as how the Ganache CLI works.

Work with Smart Contract in Public Network

When you use Brownie to deploy your contracts to public network such as rinkeby or mainnet, it automatically organize the contract ABI file by chain ID into the build/deployments folder as per below:

Python Interact with Smart Contract, Python mint NFT, Python Brownie, Ganache client

So as long as you do not delete these files, you can use ContractName[-1] to get the last deployed contract. To connect to rinkeby in Brownie console, specify the network as rinkeby:

brownie console --network rinkeby

Then you can interact with your contract same as working in the local network except you will need to load a real wallet. If you want to be sure you are working on the correct contract, you can use LegendNFT.at with deployed contract address instead of LegendNFT[-1] :

Python Interact with Smart Contract, Python mint NFT, Python Brownie, Ganache client

Awesome! you can now mint your NFT from your deployed contract in a public network. Probably you don’t want to always type the same code in the console window when you want to mint a new NFT. so let’s put the code we have tested into a script called mint.py in our scripts folder as per below:

from brownie import network, config, accounts, LegendNFT

def get_account():
    if network.show_active() in ["development", "ganache-local"]:
        return accounts[0]
    else:
        return accounts.add(config["wallets"]["from_key"])

def mint_nft():
    contract = LegendNFT[-1]
    account = get_account()
    tx = contract.makeLegendNFT({"from" : account})
    tx.wait(1)
    print("Minted a new NFT with txn hash:", tx.txid)

def main():
    mint_nft()

So when we are running this script in local network, it will just pick a dummy account for deployment, but when working in public network, it will load the account from our config for deployment.

Let’s run this script with below:

brownie run mint.py --network rinkeby

You shall see similar output as we did from console window:

Python Interact with Smart Contract, Python mint NFT, Python Brownie, Ganache client

And here is the transaction details we can see from the etherscan by searching the transaction hash.

What if I have accidentally deleted my ABI file or I want to access the contract created by someone else where I do not have the ABI data?

Don’t worried, as long as you have your contract address, you will still be able to access the contract. You can use the below from_explorer from the Contract object to fetch the information from network explorer. For instance:

>>> contract = Contract.from_explorer("0xaa629be88190fe2077b970aa02f52da40ce98454")
Fetching source of 0xaA629BE88190fe2077b970aa02f52dA40cE98454 from api-rinkeby.etherscan.io...

>>> contract.name()
'Codeforests Legend'

And if you want to try the hard way, you can actually copy the ABI data from the network explorer such as etherscan, ployscan etc. into a local file, then load it manually into a JSON object. For instance, I have saved the ABI data from here into a file called abi_from_rinkeby.json:

Python Interact with Smart Contract, Python mint NFT, Python Brownie, Smart Contract ABI

and with Brownie console connected to Rinkeby,  I can access my contract as per below:

>>> import json
>>> with open("./scripts/abi_from_rinkeby.json") as file:
...     contract_abi = json.load(file)
...
>>> contract = Contract.from_abi("LegendNFT", "0xaA629BE88190fe2077b970aa02f52dA40cE98454", contract_abi)
>>>
>>> contract.totalSupply()
1000000
>>>
>>> contract.getTotalNFTsMintedSoFar()
3

Now with the above approach, you can start mint an NFT from someone else’s contract without going through their website.

Conclusion

In this article, we have reviewed through a few approaches you can use to interact with your smart contract. Brownie is built up on top of the Web3.py, but you can still use the APIs from the web3 module to interact with your contract. There are also quite a few of popular JavaScript libraries such as web3.js, ethers.js etc., you may take a look if you are planning to interact with the contract from your frontend JavaScript code.

brownie deploy smart contract python, NFT smart contract, mint NFT, free NFT

Deploy Your First NFT with Python

Introduction

NFT (Non-Fungible Token) has exploded since last year and keeps on roaring regardless of how the cryptocurrency is performing. You must have heard of CryptoPunk, BAYC or the most recent Phantom Bear if you are following any crypto news. In the nutshell, NFT is a smart contract with some meta data such as image, video or some other attribute data to make it unique from one another. And since it’s on blockchain, it’s easy for anybody to verify the ownership. In this article, I will walk you through the process on how to deploy your own NFT with Python and we will be using a Python based smart contract development framework called Brownie.

Prerequisite

You will need to install the Brownie package. It is recommended to use pipx to install it in an isolated environment, but since I have my own virtual environment created, I would continue to use pip instead. Below is the pip command to install Brownie:

#run in your venv
pip install eth-brownie

In order to test your code locally, Brownie will automatically install ganache cli which launches a local Ethereum blockchain for you to execute your smart contracts. You can follow its documentation to download and install it if you wish to install it manually.

Get Started with Brownie

Let’s create an empty folder and then run the below command to initiate a new Brownie project:

brownie init

With the above command, brownie will create a project structure for you under your current folder. The majority of your work shall be done in the following folders:

  • contracts – for solidity smart contract source code
  • interfaces – for the interface files that referenced by the smart contract
  • scripts – for deployment scripts or interacting with deployed contracts
  • tests – for test scripts

 

Brownie supports both Solidity and Vyper language for creating smart contracts. In this article, I will be using the smart contract in Solidity style as the example.

Create NFT Smart Contract

NFTs use ERC721 token standard, so we will need to create a smart contract that implements this standard. Since the objective of this article is to walk through the process rather than a deep dive into the smart contract coding, I have just shared a sample smart contract here.  You can download this LegendNFT.sol smart contract file and the libraries folder, then put them under the contracts folder.

Here is a quick explanation of what this smart contract does:

  • an ERC721 contract with name Codeforests Legend, token symbol CFL and total supply of 1,000,000
  • 3 lists of words to generate a “random” word phrase
  • a SVG xml template to create a picture of the above word phrase
  • a list of color codes to provide “random” background color to the SVG
  • a mint function for minting the NFT tokens
  • finally a function to return how many tokens have minted

 

As you can see from the source code that I used two external contracts from OpenZeppelin, so I will need to specify this dependency so that they can be recognized when compiling the contract. Brownie provides a config file brownie-config.yaml which allows you specify these configurations. This file is not created automatically when initiating the project as all the settings are optional, you will only need it when necessary.

To specify the dependencies, you can manually create this yaml file under your project root folder. And add the below configurations:

dependencies:
  - OpenZeppelin/[email protected]
compiler:
  solc:
    remappings:
      - '@openzeppelin=OpenZeppelin/[email protected]'

It basically tells Brownie package manager to automatically download this package for you and let compiler find the correct path of the package when using the short form ‘import @openzeppelin/…’.

Compile Smart Contract

Once you have saved your contract and the config file, you can run below compile command on your terminal:

brownie compile

You shall see brownie auto downloaded the dependencies and compiled the code:

brownie compile smart contract, NFT smart contract

The output Json files will be placed into the build/contracts folder. The next step we shall try to deploy this contract locally.

Deploy Smart Contract Locally

To deploy our contract locally, we will need to use the ganache as I mentioned in the beginning. By default, when you want to deploy a contract, brownie will always start the ganache local network and deploy it to the local network. Brownie provides a console window where you can use the commands to deploy contract, but since we may need to do the deployment multiples for testing or in different networks, so let’s create a re-usable Python script called deploy.py under the scripts folder for the deployment purpose.

When your smart contract is compiled, the contract class object will be automatically added to brownie runtime environment, so we can import it from brownie directly.

Below is the Python code for deploying my LegendNFT contract:

from brownie import LegendNFT, network, config, accounts


def deploy_contract():
    account = accounts[0]
    legend_contract = LegendNFT.deploy({"from" : account})
    print("contract has been deployed successfully to :", legend_contract.address)

    return legend_contract

def main():
    deploy_contract()

The accounts is some dummy accounts provided by ganache only for your local network, so we just pick the first dummy account as the contract creator for the from parameter.

Note that your script needs to have a main function to let brownie know which function to run.

Next, let’s run the below command in the terminal:

brownie run .\scripts\deploy.py

You shall see something similar to the below output showing that both contract address and transaction hash have been generated:

brownie deploy smart contract python, NFT smart contract

Congratulations that you’ve just deployed your contract to your local network, although you may have no idea if the contract works correctly. Don’t worry, let’s continue to explore how to deploy to a public network and come back to the unit test later.

Deploy Smart Contract to Public Network

The public network usually refers to the public accessible and persistent blockchain network such as the Ethereum mainnet, rinkeby, kovan, popsten etc. To access the public network, you will need to connect to a node, but it would be too much to run your own node just for deploying a contract.

Luckily there are some service providers like Infura, Alchemy or Pocket Network etc., who allows you to use their API to connect to the public network nodes. For Brownie, it uses Infura as the default provider for connecting to public network. If you run below command in your terminal:

brownie networks list true

You can see it is using Infura’s API:

brownie deploy smart contract python, NFT smart contract, rinkeby network

In this case, you will need to sign up Infura for a free account and get your own project API token. Once you have your API token, you can create a environment variable called WEB3_INFURA_PROJECT_ID and assign your token ID to it. My personal preference would be putting this information into a .env file and load it into the Brownie config file.

And another thing you need is a real wallet account with some fund, as deployment will always incur some cost for confirming the transaction. You can create a wallet with Metamask, and for testnet like rinkeby, you can get some testing ether token from chainlink rinkeby faucet. You can also find other faucets for the rest of the testnets.

So let’s put below info in the .env file:

PRIVATE_KEY = 0x{your wallet private key}
WEB3_INFURA_PROJECT_ID = {your infrua project ID}
ETHERSCAN_TOKEN = {etherscan API token; to submit your contract source code to etherscan (optional)}

Note: to add 0x before your wallet private key so that it’s recognized as hex string. If you wish to get your contract source code verified by etherscan, you will need to sign up with etherscan and get an API token as well.

And add below lines in the brownie-config.yaml file:

dotenv: .env

wallets:
  from_key: ${PRIVATE_KEY}

# optional for verifying the contract source code
networks:
  development:
    verify : false
  rinkeby:
    verify : true

I strongly recommend you to add .env in .gitignore, so that you won’t accidently push your sensitive information into github. Your wallet private key for testnet and mainnet is the same, so bear in mind to not expose to anyone.

Now let’s also do some minor change to our deployment script, so that it uses the new account we created for deploying the contract to public network.

def deploy_contract():
    if(network.show_active() == "development"):
        account = accounts[0]
    else:
        account = accounts.add(config["wallets"]["from_key"])

    legend_contract = LegendNFT.deploy(
        {"from" : account}, 
        publish_source=config["networks"][network.show_active()].get("verify")
    )
    print("contract has been deployed successfully to :", legend_contract.address)

    return legend_contract

We have added a condition to check whether we are working on local/development network, when it’s not local network, we will load our account by the private key and use it to sign the transaction for deployment.

Now we have everything ready, let’s run the deploy.py again and specify the target network with the –network option, e.g. :

brownie run .\scripts\deploy.py --network rinkeby

Below is the output from my terminal:

brownie deploy smart contract python, NFT smart contract, Python deploy smart contract to public network

It would take slightly longer time since it requires the transaction to be minted and confirmed. You can copy the contract address or transaction hash then search it from etherscan rinkeby network.

You can see my deployed contract address here and the verified source code here.

Awesome! You just had your contract deployed in the public network where everybody can see it!!! Since you’ve not yet minted any NFT from your contract, you won’t see anything visually at the moment. To give you a idea of how your NFTs will look like, I have created this website for minting the LegendNFT, you can try and experience the minting process and then view the NFT from Opeasea. It currently supports Rinkeby, Matic and Matic Mumbai network. Below is the minting page after you have connected your wallet:

brownie deploy smart contract python, NFT smart contract, mint NFT, free NFT

Conclusion

Creating your own NFT can be a very fun journey although you may not be able to eventually sell it on the market. And there are just too much details to be covered and which is impossible to write in one post, so I will try to write a few more articles on how to perform unit test, interact with your deployed contract, as well as deploy the contract to other EVM compatible networks. Do follow me on my twitter and let me know if you encounter any issue with your contract deployment.

python virtual environment, isolated environment

3 Ways for Managing Python Virtual Environment

 Introduction

Python virtual environment refers to an isolated execution environment for managing Python versions, dependencies, and indirectly permissions. When you have multiple projects working on and there are potential conflicting requirements such as different Python versions or libraries to be used in these projects, you need to consider using a virtual environment so that installing packages for one project will not impact another.

In this project, we will discuss about the different ways to create Python virtual environment for your multiple projects.

Using venv or virtualenv

Since Python 3.3, it introduced a lightweight venv module for you to create virtual environment, so that you do not need to install any additional tools for it. But if you are working on some projects with older Python versions, you will need to install another popular package called virtualenv for managing the virtual environment. Using Windows as an example, the steps to create a virtual environment are as simple as below:

  • Go to your command line window (Win- R)
  • Use “cd” command to switch to a writable folder where you want your Python virtual environment to be created e.g.: a dedicated folder called “py_venv”
  • Use below command to create a virtual environment for your project
python -m venv project_name
  • It takes a few seconds for the above to complete, and you shall be able to see below folder structure created under “project_name”:Python virtual environment, Python isolated environment

The pyvenv.cfg file describes the home directory, Python version etc. If you are using virtualenv, it will also indicate the version of this package you used.

  • Under the “Scripts” folder, you can see the following files:

Python virtual environment, Python isolated environment

Now the virtual environment has been created successfully. From the command line window, you can go to the “Scripts” folder and type “activate” or “activate.bat” and hit enter. You shall see below on your command line:

Python virtual environment, Python isolated environment

When seeing project name prefix at the beginning of the command, it means you are already in the virtual environment mode for this project, now you can proceed to install all the necessary packages, build and debug your code in this isolated environment. There is a system-site-packages parameter which you can specify whether you want to inherit the packages from the global environment, this might be useful when you have some heavy packages that you do not want to re-install them in each of your virtual environment.

To exit from the current virtual environment, you can run the “deactivate.bat”:

venv deactivate

Or if you need to switch from one virtual environment to another, for instance another virtual environment called “test”, you can activate the “test”, then it will exit the “project_name” automatically and switch to “test” environment:

Python virtual environment, Python isolated environment

From the above steps, you may wonder how to specify the different Python version when creating your virtual environment.

Assuming you have already installed Python 3.7 and Python 3.8, and during the installation, you have added the Python installation directories to your PATH variable. When you use “where python”:

Python virtual environment, Python isolated environment

You can see all the different Python versions you have installed. As “venv” does not support to specify the Python version, you will need to use virtualenv for this case, for instance:

virtualenv test1 -p python3.7
virtualenv test2 -p python3.8

This shall create the virtual environments based on the Python version you’ve specified, you can verify the version_info from the pyvenv.cfg file in the folder.

Many people may have question on whether the virtual environment folder shall be created separately or put it under the same place where your source code is placed.

Generally there is no right or wrong where you shall create your virtual environment folder, but you shall exclude this folder when you submit your code to the repository since other people may not be able to re-use whatever packages you’ve installed due to the different OS they are using.

My personal preference is to have a dedicated folder for all the virtual environment setup, and use the below command to export the installed packages when submitting to the repository, so that your source code folders will be cleaner:

python3 -m pip freeze > requirements.txt

Sometimes you may find tedious to switch between your existing virtual environments by using the activate/deactivate script, the virtualenvwrapper provides easier way to switch environment by names. You can read more from its official documents

Managing Python Virtual Environment with Conda

Conda is an open-source package and environment management tool which you can easily use it for handling your Python with conflicting dependency requirements. If you have never used it before, I would suggest you to do a quick review on the user guide from their official website, and down the miniconda based on which OS you are using.

To create a virtual environment with Conda, you can run the below from your command line:

conda create -n test python=3.9

You can see that Conda allows to specify the Python version, and it will check and download the Python version you’ve specified if it is not installed yet, and set up the virtual environment in the below default folder.

Python virtual environment, Python isolated environment

To activate your virtual environment:

conda activate test

Once it’s activated, you shall see the prefix added to your command line. And you can use below command to check what are the packages available in your current environment:

Python virtual environment, Python isolated environment

To deactivate your virtual environment, you can use the below:

conda deactivate
# or
conda activate other_project

Note that you can use both Conda or pip to install new packages, e.g.:

pip install numpy

You can see that when it is installed from pip, the channel will be indicated as pypi:

Python virtual environment, Python isolated environment

For more usage of Conda, you may refer to their official documentation here.

Managing Python Virtual Environment with IDE

If you are using IDE for Python programming, such as PyCharm, Visual Studio Code or Sublime Text, they usually provide an option for you to specify whether you want to set up a virtual environment when creating the new project. This would save you some effort for manually setting up the virtual environment and you do not have to worry about any potential conflict among your multiple projects.

Using PyCharm as an example, when create/import a new project, you are able to use the virtualenv tool to create a project-specific isolated virtual environment. The virtualenv tool comes bundled with PyCharm, so you do not need to install it separately.

Python virtual environment, Python isolated environment

For the detailed steps, you can refer to the PyCharm official document. Similar guide you can find for the other IDEs.

Conclusion

In this article, we have discussed about the purpose of using Python virtual environment and the different ways you can use to set up an isolated environment for your Python project. If you are new to topic, you may get confused as you would see many variant of the virtual environment tools people talked about, such as pyenv, pyvenv, pipenv, pyenv-virtualenv etc. Some of them are already deprecated in the later Python versions, so for a start, you shall concentrate on the built-in module venv, and then explore virtualenv and virtualenvwrapper for more advanced features.

combine data in pandas with merge vs join

Pandas Tricks – Combine Data in Different Ways

Introduction

If you have used pandas for your data analysis work, you may already get some idea on how powerful and flexible it is in terms of data processing. Many times there are more than one way to solve your problem, and choosing the best approach become another tough decision. For instance, in one of my previous article, I tried to summarize the 20 ways to filter records in pandas which definitely is not a complete list for all the possible solutions. In this article, I will be discussing about the different ways to merge/combine data in pandas and when you shall use them since combining data probably is one of the necessary step you shall perform before starting your data analysis.

Prerequisites

If you have not yet installed pandas, you may use the below command to install it from PyPI:

pip install pandas

And import the module at the beginning of your code:

import pandas as pd

Let’s dive into the code examples.

Combine Data with Append vs Concat

Imagine you have below two data frames from different sources, now you would like to merge them into one data frame.

df1 = pd.DataFrame({"ID" : [1, 2, 3, 4, 5], 
"Name" : ["Aaron", "Jimmy", "Zoe", "Jill", "Jenny"]})
df2 = pd.DataFrame({"ID": [6], "Name" : ["Kelly"]})

The most straightforward way would be using the append method from the pandas DataFrame object:

df1.append(df2, ignore_index=True)

The append method allows to add rows to the end of the current data frame, and with the ignore_index parameter as True, the resulting axis will be relabeled starting from 0.

You would see the output as per below:

combine data in pandas with merge vs join

Alternatively, you can use the pandas concat method which is self-explanatory based on its name. It provides a few more parameters to manipulate the resulting data frame such as specifying the axis for the concatenation to be done as well as the join logic for either union or intersection operation.

You can use the below to generate the same output as previously:

pd.concat([df1, df2], ignore_index=True)

And if you would like to retain a reference to the sources in your result, you can use the keys as per below:

pd.concat([df1, df2], keys=["src_1", "src_2"])

This would return a multi-index data frame where you can easily refer back to the data by source (e.g. df.loc[“src_1”]).

combine data in pandas with merge vs join

Adding new data frame as columns can be also done with axis = 1, for instance:

df3 = pd.DataFrame({"Age" : [12, 13, 13, 12, 13]})
pd.concat([df1, df3], axis=1)

The data frame has been added as one column to the caller:

combine data in pandas with merge vs join

As concat method accepts a list of data frames, you can combine multiple data frames at one time, which would be much faster than using append to do one by one.

Merge Data with Join vs Merge

Beside appending rows or columns based on axis, sometimes you may need more sophisticated operations similar to the left/right join in a rational database. For such scenarios, you shall make use of the pandas merge or join method.

For the previous example to append df2 to df1, you can achieve it with merge as well:

df1.merge(df2, how="outer")

Output as following:

combine data in pandas with merge vs join

It would be more tedious if you want to achieve the same via join since it can only join the data frame based on index, so you will have to set the index to the correct columns you would like to use as key. Below is how you can do it via join:

df1.join(df2.set_index(["ID", "Name"]), 
        on=["ID", "Name"], how="outer").reset_index(drop=True)

Assuming you have the below student’s score for each subject, and you want to merge the student information (df1) and the below based on the “Name” column:

df4 = pd.DataFrame({"ID" : [1001, 1002, 1003, 1002, 1001],
                    "Subject": ["Science", "Math", "English", "Math", "Science"], 
                    "Name": ["Aaron", "Jimmy", "Jimmy", "Zoe", "Jenny"], 
                    "Score" : ["A", "B", "C", "B", "B"]})

With merge function, you can specify the joining logic as left join on “Name” column as per below:

df1.merge(df4, on="Name", how="left")

Pandas will automatically add suffix whenever there are columns with duplicate names (e.g. “ID” in df1 and df4) from the two data frames, below is the output you may see:

combine data in pandas with merge vs join

To generate the same output via join, you can use below code which you need to pre-set the index for df4 and specify the suffix for left and right data frame:

df1.join(df4.set_index("Name"), on="Name", lsuffix="_x", rsuffix="_y")

Of course, if you would like to perform the right join for the above two data frames, you can do as per below:

df1.merge(df4, on="Name", how="right")
# or
df1.join(df4.set_index("Name"), on="Name", how="right", lsuffix="_x", rsuffix="_y")

Output as per below:

combine data in pandas with merge vs join

Merge DataFrame with Duplicate Keys

When merging multiple DataFrame objects, you may occasionally encounter the scenario that there are duplicate values for the columns you want to use as keys for joining. For instance, you may have below records if one subject has more than one lecturers:

df5 = pd.DataFrame({"Subject": ["Science", "Science", "Math", "Math", "English"], 
                    "Lecturer": ["Michael", "John", "Tim", "Robert", "Alex"]})

When you merge this information with student score based on the subject with merge or join method:

df4.merge(df5, on="Subject", how="left")
#or 
df4.join(df5.set_index("Subject"), on="Subject", how="left")

You would see the below output with M x N records due to the duplicate key in the df5:

combine data in pandas with merge vs join

If your objective is to perform something similar to excel vlookup to return the first matched value, then you can use the drop_duplicates method to remove the duplicate records before joining. E.g.:

df4.merge(df5.drop_duplicates("Subject"), on="Subject", how="left")

This would allow you to combine the two data frames with the first matched record from df5:

combine data in pandas with merge vs join

And in case you do not want to lose the information from the lecturer data frame, you will need to perform some sort of data aggregation before joining, e.g.:

df4.merge(df5.groupby("Subject").agg({"Lecturer" : lambda x: ','.join(x)}),
 on="Subject", how="left")

With this aggregation on the lecturer values, you would be able to see the below output:

combine data in pandas with merge vs join

Based on the above examples, you may find that merge and join are interchangeable in most of the cases, and you may have to type a bit more when using join method due to the different default arguments used. Since it always works on the index, you will have to preset the index on the key columns before joining.

Conclusion

In this article, we have reviewed through a few methods pandas offered for combining data frames with some sample code. To wrap up, the append and concat are usually used for merging two or more data frames based on the row or column index, and concat has better performance over append when you have multiple data frames to be worked on. If you need some high performance in-memory join operations like SQL joining for rational database, you will need to use merge or join method which can be interchangeable in most of the scenario. In addition, if the data frame you worked on does not have a index on the joining row/column, using merge over join would probably save your some typing.

 

group consecutive rows of same values in pandas

How to group consecutive rows of same values in pandas

Problem Statement

You have a data set which you would like to group consecutive rows if one of the columns has the same values. If there is a different value in between the rows, these records shall be split into separate groups.

To better elaborate the issue, let’s use an example.

Assuming you have the connection log data for some devices such as Bluetooth. It triggers a event when the connection is established as well as when it’s disconnected from the paired device. In between, there may be additional events triggered out for connectivity test. Let’s load the data and visualize it:

import pandas as pd 

df = pd.read_excel("connection log.xlsx")

df.head(10)

You can see the below output from Jupyter Lab:

connection log data

If you would like to check the duration for each device per every connection, you probably want to group these records if the events are triggered during the same connection. To determine whether the records are within the same connection, you shall sort the event date in ascending order and if the device Id is not the same in the consecutive rows, then they must be some events for different connections. So can this be done in pandas?

Solution to group the consecutive rows

Let’s do some sorting to our data first to make sure the records are in chronological order based on the event date:

df.sort_values(["Event Time", "Device ID"], ascending=[True, True], inplace=True)

To compare the value of current row and subsequent row for a particular column, we can use the data series shift method. For instance, we can shift the “Device ID” values to next row and store the result into new column named “Device ID X”:

df["Device ID X"] = df["Device ID"].shift()

After the shifting, you shall see the updated data frame as per below:

shifted rows

If you try to compare the values of both columns:

df["Device ID"] != df["Device ID X"]

You can see the return value of the True/False in a data series form:

compare device id column

Now it is coming to the most critical step. Since we know that there is only one True value when device ID switched to a new ID, we can use the pandas cumsum method to sum up these True values accumulatively as per below:

df["cumsum"] = (df["Device ID"] != df["Device ID X"]).cumsum()

When doing the accumulative summary, the True values will be counted as 1 and False values will be counted as 0. So you would see the below output:

comparison result

You can see that the same values calculated for the rows we would like to group together, and you can make use of this value to re-group the records for further analysis.

You can even simply combine the above steps into one liner to get the earliest and latest event time for each group as per below:

df.groupby((df["Device ID"] != df["Device ID"].shift()).cumsum()).agg({"Event Time" : ["min", "max"]})

Output as per below:

one liner result

You may be also interested in some other similar topic from here.