xinsir commited on
Commit
535f0a1
1 Parent(s): 3207ab5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +22 -1
README.md CHANGED
@@ -88,6 +88,24 @@ import torch
88
  import numpy as np
89
  import cv2
90
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
91
  controlnet_conditioning_scale = 1.0
92
  prompt = "your prompt, the longer the better, you can describe it as detail as possible"
93
  negative_prompt = 'longbody, lowres, bad anatomy, bad hands, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality'
@@ -143,7 +161,10 @@ images[0].save(f"your image save path, png format is usually better than jpg or
143
 
144
  ## Training Details
145
 
146
- The model is trained using high quality data, only 1 stage training. The resolution setting is the same with sdxl-base, 1024*1024
 
 
 
147
 
148
 
149
  ### Training Data
 
88
  import numpy as np
89
  import cv2
90
 
91
+ def HWC3(x):
92
+ assert x.dtype == np.uint8
93
+ if x.ndim == 2:
94
+ x = x[:, :, None]
95
+ assert x.ndim == 3
96
+ H, W, C = x.shape
97
+ assert C == 1 or C == 3 or C == 4
98
+ if C == 3:
99
+ return x
100
+ if C == 1:
101
+ return np.concatenate([x, x, x], axis=2)
102
+ if C == 4:
103
+ color = x[:, :, 0:3].astype(np.float32)
104
+ alpha = x[:, :, 3:4].astype(np.float32) / 255.0
105
+ y = color * alpha + 255.0 * (1.0 - alpha)
106
+ y = y.clip(0, 255).astype(np.uint8)
107
+ return y
108
+
109
  controlnet_conditioning_scale = 1.0
110
  prompt = "your prompt, the longer the better, you can describe it as detail as possible"
111
  negative_prompt = 'longbody, lowres, bad anatomy, bad hands, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality'
 
161
 
162
  ## Training Details
163
 
164
+ The model is trained using high quality data, only 1 stage training, the resolution setting is the same with sdxl-base, 1024*1024. We use random threshold to generate canny images like lvming zhang, It is essential to find proper hyerparameters
165
+ to realize data augmentation, too easy or too hard will hurt the model performance. Besides, we use random mask to random mask out a random percentage of canny images to force the model to learn more semantic meaning between the prompt and the line.
166
+ We use over 10000000 images, which are annotated carefully, cogvlm is proved to be a powerful image caption model[https://github.com/THUDM/CogVLM?tab=readme-ov-file]. For comic images, it is recommened to use waifu tagger to generate special tags
167
+ [https://huggingface.co/spaces/SmilingWolf/wd-tagger]. More than 64 A100s are used to train the model and the real batch size is 2560 when used accumulate_grad_batches.
168
 
169
 
170
  ### Training Data